The Freedom of Information Act (FOIA) is a federal law that gives the public the right to make requests for federal agency records. All federal agencies — including EPA –are required to make requested records available unless the records are protected from disclosure by certain FOIA exemptions.
However, Federal Agencies often struggle to respond in a timely manner to these requests. This is often part of a larger, overall system issue with “data life cycle management practices.” It’s not for lack of trying. In reality, many of these agencies have paper based-records management and data retention policies that are largely unenforced, for a number of different reasons. First and foremost, these policies are often written by legal and compliance folks who know very little about a day in the life of the normal “business user” within their agencies.
Second, these policies are often written without consultation and advice from IT and Security, and thus they do not always reflect either what is technically possible to enforce, or even the reality of what every day workers in these organizations are doing. Therefore, the reality of compliance is slim, if possible at all. Left to their own devices, government end-users will most likely make poor or selfish decisions when it comes to data management (as they do with security). Most end users believe that their information is critically important.
They tend to keep it for longer than necessary with the thought that they “might need it again someday.” And finally, they keep it (often) where it’s most easy for them to access it, rather than in properly secured places across their networks.. This can lead to a proliferation of data across corporate and personal networks and devices. It can also make for the loss of good knowledge management and critical corporate intellectual property, and potentially an increase in potential security and privacy risk.
In reality, there is VERY little information that should live forever, and most — if not all — data should be subject to very specific and prescriptive life cycle management practices. Data (like people) should have a beginning, middle and an end. Whether data is generated by and within your agency or collected by your agency from a third party (another federal agency, government partner, citizen, vendor, partner, other), the only way you can effectively protect it is by understanding it. Does it contain citizen information, employee information, intellectual property, sensitive communications, personally identifiable information, health information, financial data, etc.?
Data without controls can create operational, privacy, and security gaps that could put company assets at risk. Only once you know what it is, where it is, who can access it, and who has accessed it, can you then make decisions about where it should live. Data in a highly secure system may need less controls than data located in a cloud environment or a broadly available corporate intranet or Web site. Data sovereignty rules also dictate what controls are needed, including what should be kept on premise and when can or should you go to the cloud, and the location of the data.
Implementing a Best Practice Approach for FOIA Management
So what does this look like in practice? In a standard agency, data is created or collected by your organization, used by the organization, shared within the organization, or by the organization with others, and then ultimately it should have a disposition (in compliance with any regulatory or statutory records management requirements of course.) The longer you have the data, the more “at risk” you are of having that data potentially breached or shared inappropriately.
There are some key considerations you must address before you start the process. First you must understand how data is created or collected by your agency. You should think about excessive collection, how you will provide notice to individuals about that collection, provide them with appropriate levels of choice, and keep appropriate records of that collection and creation.
Next, you should think about how you are going to use and maintain this data. Here you should consider inappropriate access. You should ensure that the data subjects choices are being properly honored andaddress concerns around a potential new use or even misuse. Also,consider how to address concerns around breach and also ensure that you are properly retaining the data for records management purposes. Consider who — and with whom — this data is going to be shared. You should consider data sovereignty requirements and cross-border restrictions along with inappropriate, unauthorized or excessive sharing.
Finally remember that all data must have an appropriate disposition. You should keep data for as long as you are required to do so for records management, statutory, regulatory or compliance requirements, and ensure you are not inadvertently disposing of it, but at the same time, as long as you have sensitive data, you run the risk of breach. Once you’ve answered these questions, it’s time to implement your program operationally, this is how that kind of program would work in four simple steps.
Four Easy Steps for Freedom of Information Act Compliance
- Discover and classify your data The first step to take when it comes to properly disposing of data is to determine what type of data you have. An example of a common data classification schema is data must be classified as public, internal, sensitive, or restricted. The classification of the data dictates its disposal method. This does not have to be completed as an all-or-nothing effort, but rather can be done through a phased approach and as part of an initial “discovery” project across a limited scope of data to help build the business rules that can then be disseminated across the organization’s data repositories. The goal of the classification in this step is to get far enough along so that you can proceed with the second step.
- Determine the Retention Once you have determined the classification of your data, you need to be sure that it is not subject to any retention periods. Region and government-specific laws and regulations, requirements of accrediting and other external agencies, and prudent management practices govern the retention and disposal of organizational records. These records must be retained appropriately and disposed of in a timely manner to meet the requirements of external regulations.
- Assign historical value After you have determined that data in your possession is not subject to any retention period, it is important to evaluate whether the documents have any historical or archival purpose for the organization. In some instances, data ready to be disposed of may contain information with enduring legal, fiscal, research, or historical value, and should be retained and preserved indefinitely.
- Appropriately dispose files Finally, after data in your possession is classified and reviewed for retention and archival purposes – and it is determined that the data can be properly discarded – the last step is to dispose of your data in the appropriate manner.
A good program must continually assess and review who needs access to what types of information. Additionally, organizations should work with their IT counterparts to automate controls around their enterprise systems to make it easier for employees to do the right thing than it is to do the wrong thing or to simply neglect the consequences of their actions. Once you’ve implemented your plan, be sure that you maintain regular and ongoing assessments.
Returning to Classification, there has been a long-running debate over end-user versus automated tagging. There are countless benefits to properly tagged content, it’s more organized, easier to find, optimized for search and indexing, and when classification is also used, the content can also be more easily protected. While common sense might suggest that a document author is the best person to tell you what their document is about, end-users are also notoriously poor at tagging their own documents.
This is the case for a number of reasons. First, going into the properties of an Office document or a PDF is an extra step, and it’s something that requires knowledge, discipline, and interest to complete. Second, end users often simply don’t recognize that if they DON’T go through the process of tagging a document themselves, metadata will, in some cases, be automatically assigned into properties with or without their input. This can lead to a big problem if a document has inherited the properties or metadata of a past version. Say, for example, you are modifying an existing proposal that you wrote for a new customer, or borrowing a document that was written by someone else to build into your own content.
More likely than not, that document’s metadata properties could contain embarrassing or potentially highly sensitive information, such as customer names, personally Identifiable information, or even possible trade secrets. Third, many companies fall into the trap of building extremely cumbersome classification policies and procedures, with the best of intentions but possibly dire consequences. So they go through the process of delineating between public information (data that can be shared with anyone), internal information (data that is not highly sensitive, but that should not be shared outside of the company), and then either confidential or highly confidential data (data that should be protected because of harm to individuals or to the company if its exposed.)
All of this sounds quite logical so far. However, when you couple this kind of schema with additional barriers to end-users that require them to take many more steps to work with the confidential or highly confidential data, you run the risk of pushing them to under classify their content in order to do their jobs effectively and easily. So what is the solution?
Well most obviously, automated tagging eliminates guess work and the problem of end-users under classifying their content.
Automated tagging tools can help eliminate guess work, human error or even the likelihood that an end-user may try to take short cuts to get around your technical security controls to get their job done more quickly. The Compliance Guardian Solution from AvePoint automates the process of auto-classification, allowing you to properly tag and identify data on collection. This will assist you in every step of the e-discovery and data life cycle management process.
However, Compliance Guardian’s rich search facility will truly empower your organization to accelerate its e-discovery and FOIA response time. Compliance Guardian’s rich rules engine allows the software to search not only for traditional keywords, patterns and regular expression, in content and context, but also our Machine Learning capability can be a very useful fool to respond to FOIA request.
This Machine learning can be a useful tool to recognize similar types of content from provided samples and generate models to predict new files. Some typical examples are ITAR-related documents, source code, and documents on the same subject.
To train and build the machine learning models, Compliance Guardian’s native Content Classification Tool can be used. Once the training process is complete, the model can be used to predict/scan new files.
At the same time Compliance Guardian is helping you find “relevant” information subject to a FOIA request, it is also eliminating duplicate or redundant and repetitive information through file analysis. File analysis allows Compliance Guardian to quickly identify redundant, obsolete, or trivial (ROT) data as well as duplicate information. Using this information, Compliance Guardian can reduce the data that needs to be scanned, improve response times, and reduce risk – whether data will ultimately remain relevant to the request or not.
Finally, Compliance Guardian has a series of unique data protection techniques to allow you to safely respond to a FOIA request, including built in content redaction. Once documents and files are found that must be analyzed or returned as part of a FOIA response, agency staff and lawyers may then still need to spend thousands of hours redacting sensitive, protected or classified content from those documents before they can be shared.
Here Compliance Guardian will automate this process with auto redaction based on easily configurable rules. Thus information can be discovered, mapped, classified, examined, redacted and protected as part of the FOIA evaluation and response process.