Stale content accumulates slowly but surely in every Microsoft SharePoint environment. Whether your end users regularly upload unnecessary or irrelevant content to team sites, collaborate so much on documents that each file has over a hundred versions, or simply have too much data that has become obsolete, it is important to prune stale items in order to free up infrastructural resources for the more important, business-critical content.
One of the many advantages of our DocAve Storage Manager and DocAve Archiver tools for SharePoint storage optimization has always been the wide offering of flexible criteria for externalization of data from SQL Server content databases. Users can configure rules to offload binary large objects (BLOBs) according to a property or combination of properties such as:
- File name, e.g. anything containing the word “draft”
- Author, e.g. anything created by an employee who left the company two years ago
- Old versions, e.g. keeping only the most recent five versions of each document in the content databases
- Any custom metadata value, for further granular identification of content that should or should not be externalized
The date and time when a file was last accessed by a SharePoint end user is arguably one of the most popular rules for BLOB externalization. Last accessed time can often be a better indication of a document’s relevance and worth to the organization, because employees may still be opening and referencing it on a regular basis even if nobody is actually modifying the contents therein. However, while file systems inherently track last accessed time as a metadata value, in SharePoint this information can only be interpreted from the audit records. SharePoint administrators are often reluctant to enable SharePoint auditing, especially for view events, due to the storage burden of the resulting generated audit data.
Last Accessed Time in DocAve: A History
Some of you DocAve veterans may recall that AvePoint has provided a last accessed time rule – relying on the SharePoint Auditor – since the days of DocAve 5 Archiver. Ideally, though, SharePoint administrators should not be keeping all of the audit logs in the content databases indefinitely. Audit data is often known to grow exponentially, which could eventually start to affect SharePoint performance, especially in larger deployments with many users and a lot of activity.
In DocAve 6 Service Pack 1, DocAve Storage Manager and DocAve Archiver featured a last accessed time criterion that tracked this data in our own DocAve stub database. Since it did not require the storing or summarizing of logs, it was extremely efficient, but it only worked on BLOBs that had already been externalized by DocAve. As such, it was geared more toward the management of tiered storage rules for already existing stubs and BLOBs. Meanwhile, for newly created stubs, the last accessed time of the stub would be updated to the stub creation time (that is, the time when the BLOB was externalized).
Now with DocAve 6 Service Pack 3, we offer an improved last accessed time rule. Same as in DocAve 5, this feature uses SharePoint audit data, only now it leverages the DocAve Report Center’s Audit Controller. This is more beneficial to the administrator because the Audit Controller retrieves the audit logs from SharePoint content databases and stores them in our own report database; SharePoint itself does not need to be touched in order for us to query audit data. DocAve Report Center also provides the ability to prune the audit data from our report database and store it on a file share, outside of SQL Server, for data retention and storage optimization.
A common concern around last accessed time data is that this timestamp typically gets updated by search crawls or other service account activity, preventing it from being a true indicator of when a file was last read by an actual user. DocAve Report Center’s Audit Controller addresses this pain point by allowing for selective, flexible retrieval of data from the SharePoint Auditor database. When a user configures an audit plan, he or she can enter the names of specific accounts to exclude, as such:
In the example above, no activity by avepointadmin or avepointda_service would be collected from SharePoint into DocAve’s own report database, and would therefore not appear in the Report Center Auditor Reports. As far as DocAve is concerned, there is no audit log activity by avepointadmin or avepointda_service. Therefore, with the Track last accessed time checkbox selected (see below), activity by these two accounts would also not be factored in when DocAve Storage Manager and DocAve Archiver scan the files within this scope for the date and time of their last “real” access.
As you may have realized, a Compliance Reporting license (already included if you have a DocAve Report Center license) is required for the user to be able to set up an audit plan and take advantage of this externalization rule. When equipped with the audit records in SharePoint, you can now capture a complete picture of stale content throughout the environment, not just content that has already been externalized. Without Compliance Reporting, the DocAve system cannot pull SharePoint Auditor data, and DocAve Storage Manager and DocAve Archiver modules will not be able to track objects’ true last accessed time.
If you do not have a Compliance Reporting or DocAve Report Center license, you can still externalize content based on their last accessed time, only DocAve will use the stub-tracking method from Service Pack 1 described above in this instance. For more information about this configuration file option, please refer to Appendix E of the DocAve 6 Archiver user guide.