Many companies develop policies controlling the publishing of data. Compliance can cover both internal- and public-facing content management systems. In many cases, the content management systems and content are lumped together when considering compliance to these policies and standards. On its face this seems reasonable, but unlike thousands of lines of source code for the content management system, content in gigabytes and beyond cannot be measured so simply. Software Quality Assurance specialists are familiar with looking at the quantity of errors that occur in relation to the total lines of code, and these metrics are used to help companies determine the appropriate remediation methods. However, a common error is to apply the same methods of quality assurance to content quality assurance (compliance issues). Unlike the static nature of an application, content is dynamic and thus a constant moving target for compliance and content quality assurance managers.
It is true that software quality assurance engineers will look at the number of defects per thousands of lines of code to determine next steps as well as provide management with estimates related to resources and time needed to correct software errors. It is also true that it is very easy to use automated testing software against your content management system itself to come up with a list of items that need review and/or defects (bugs) that need remediation. However, this is not so simple for the content compliance story. We have to ask, can a simple analysis of errors versus lines of content help bring terabytes or even megabytes of data into compliance in any reasonable manner or timeframe? If we consider the number of resources that would be required to accomplish this, we are left with a severe issue. There are more people creating content than there are people validating content for compliance to policy. However, based on today’s laws and requirements, we cannot just throw up our hands and surrender. Additionally, we should not throw out lessons learned from Software Quality Assurance (SQA) when we consider Content Quality Assurance (CQA). We can still use formulas; we just need to change the variables we apply.
Considering this, we have to merge worlds a bit and move forward with a plan to assure quality. To do this we have to acknowledge that, in terms of content, we are dealing with the equivalent of billions of lines of code. The most advanced automated testing platform by itself would be hard pressed to accomplish this task in a timely or useful manner. We must also consider the required human review and resources required would lead companies to the realization that they are in an impossible situation if they attack the problem traditionally. CQA requires new thinking, and the place to start is the large data sources (aka Big Data, which Gartner Research defines as a “high-volume, velocity, and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making) and what we can learn from the data itself.
The first thing we notice about unstructured data is that it is impossible to treat all of this data as being equal. With large data sets we experience greater problems with sharing, searching and of course performing any kind of analysis visualizing the data or the scope of the errors. But we have opportunity as well; we have the ability to draw on our content management system, infrastructure, and management platform capabilities, such as those provided by the DocAve Software Platform, to derive real Business Intelligence – defined by Gartner Research as including the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance – and then to solve the problem of content quality assurance. When considering compliance, we can use our platform for content management and control. The DocAve Software Platform can deliver knowledge to the right people who can then handle the governance, risk and compliance (GRC) issues easily regardless of the content collection size.
A demonstration of this would be to consider an enterprise organization. Let’s say that this Company has a collaboration site, and in this site the company stores 1,000,000 documents. These documents include but are not limited to: Microsoft Office files, PDFs, HTML, XML, and Text Files. The documents exist across 300 servers and thousands of locations within the same. Each document averages 1,000 lines of content, though some have more. Consequently, it’s clear that the compliance task cannot be accomplished via traditional methods. What is our next step in this situation?
First, we need to understand our content. For this, it is about how we manage our large stores of unstructured and or structured data. Using auditing and discovery tools, content and customization management and deployment tools, and general access solutions, we begin to have knowledge of our documents and content management systems. These platform capabilities have the obvious benefit of providing us with real knowledge about our data and how it is used. This knowledge can then be used to deliver business intelligence to the compliance team.
Data that the team receives from the platform solution can dramatically change the remediation resources required. An infrastructure and platform system may provide information and metrics that demonstrate that sensitive information exists within the system, but has not been accessed for a significant period of time, or may demonstrate that this sensitive data is in a completely secure and protected area of the system. Further, it may show that sensitive data exists in draft documents that are not yet published. These simple examples of the power of the data that the infrastructure information provides show how quickly and easily you can dramatically reduce the amount of content that must be reviewed, making it a manageable task. Imagine millions of errors in documents that have been accessed less than once in the last 90 days, do you as a company remediate or quarantine these documents? Do you just set your platform backup system to archive qualifying documents?
It is clear that as we look at content quality assurance versus software quality assurance, the story is different – not only because of differences in uses but because of the nature of content versus the content management system itself. In addition, data grows constantly and rapidly – and to manage this you need much more than a content scanner; you need a complete platform designed to manage structured and/or unstructured data.