Function | Description | Importance |
Supports Embedded Metadata | Allow the embedding of data into documents. | While there are many ways to assign metadata, the metadata itself should also be embedded directly into content so that the metadata travels with the data. |
Supports Custom Taxonomies | Allow users of content management systems to create their own taxonomy and implement controlled vocabularies. | Metadata itself should have structure, while crowdsourced metadata like hash tags are great for Twitter in a company you could actually use a more refined approach to cataloging your data. |
Auto Classification | Allow for content to be automatically classified based on rule types, taking into account advanced vocabulary support. | Auto Classification is not always black and white. It needs to be flexible and capable of looking at multiple rules beyond simple content in order to determine meaning. Take a document for a product as an example: the footer may contain the name of the other products in a company, so we cannot classify by product name alone but rather look at other factors. The same is true of determining sensitivity. If we find an address by itself, does that constitute a PII violation or do we need to classify based on complex rules? Of course, we need complex rules and you cannot auto classify without this capability. |
Allow User to Enter Metadata | Users can add metadata to the document. | User-generated metadata is important. In our example taxonomy in this post, we have Description, which is an obvious reason to have user-entered metadata. |
Ensure Quality of User Selected Metadata | Since users can add metadata, a classification system should be able to identify errors and correct if necessary. | In any system where a user can enter metadata, there needs to be a way to assure quality. The system needs to validate that the selection from a controlled vocabulary is in line with the actual content itself. For example, what if an author selected Public from the sensitivity setting and the automatic classification system found employee payroll data? This user error should be identified, alert someone, optionally correct the metadata automatically, or quarantine the content for further review. |
Ensure Quality of User-entered Metadata | In freeform Text Fields, users can enter metadata and the systems need to validate that the text does not violate policy and matches the document. | From classification errors to keyword stuffing, since the invention of the search engine people have been trying to improve their metadata to make their content appear higher in rankings so that it is read first – even if that meant misrepresenting data or stuffing keywords wherever possible. The metadata management system must prevent these types of author innovations! |
Transfer Security-enabled Metadata | Encryption level must be determined based on sensitivity of content to provide content with site-specific classification. | If we refer back to our sensitivity-controlled vocabulary, we can then look for more qualities of the transport to determine if the content is being viewed properly. This way, we can perform more actions if the document is of a protected class and the system finds it is not protected. If this is the case, the system should be able to move the document to a user-specified correct location. |
Rule Type | Description | Importance |
Does Text Exist | Location of Text in a document or in document metadata as a way that can be used to assign a metadata element (Tag) value. | The existence of Text or keywords has always been a prime way to determine meaning or relationships. With controlled vocabularies, this becomes even more important as we can have rules of both how to classify and validate human classification based on what is actually found in the content versus metadata. |
Conditional Text | Like the above rule, Location of Text in a document or in document metadata as a way that can be used to assign a metadata element (Tag) value. It can also be used as a conditional to look for another word or String. | Continuing on the Does Text Exist rule, this provides a more complex way of looking at text relationships to determine classification. |
Dictionaries | Another text-based rule to determine if the system can assign a controlled vocabulary term based on the existence of one or many words. | Most classification and terms get really complex and there can be a shopping list of terms that, if present or not present, determine classification. |
Element Validation | Whether user-entered metadata or structural elements, one can find meaning from element-based data. This check looks into element and attribute data to find meaning. | Be it HTML, XML, or XAML, we can find meaning from data and/or location. Certain words may have more value if found in Headers, Footers, or H1 Tags. This data needs to be explored and evaluated in order to better define meaning. |
Enhanced Elements | This rule type looks deeper into the structure of content to find information about elements that exist in elements of content. | Much like conditional find text, elements within elements can be used to find meaning. |
External Content | Searching for content within content that is actually only a reference to content existing on external sites. | This is a special rule type that identifies mixed content that can be used again to classify a document or assign a controlled vocabulary term. |
RegEx | Location of Regular Expression Match in a document or in document metadata as a way that can be used to assign a metadata element (Tag) value. | This provides a powerful method to classify content based on pattern matching. |
Conditional RegEx | Location of Regular Expression Match in a document or in document metadata can trigger another regular expression pattern search as a way we can use to assign a metadata element (Tag) value. | In some organizations, it becomes essential to find more than one item – rather pairs of items in the same content or structure. |
Transport | Transport tests the usage and level of usage of the HTTPS protocol that is being used to serve content. | Many classifications and/or actions on classification can be made by determining communications type. For example, we may transmit Secure and Sensitive Information but only if the connection of a site is secure. A classification system should include the capability of classification-based rules and these rules must be able to ask questions about site. |
Cookie | Aligned or Connected Data attached to content can be tested to determine classification. | If content has or exposes PII, it is important to classify the document properly. Cookies can determine content sensitivity. |
Custom | A custom check type allows the performance of external functions if a condition is found. | In some cases, classification rules become a bit more complex. Is something near something else or far from something else? Does a number match some checksum? Does some complex relationship exist? The custom check isn’t meant to find things that we thought of. It is meant to classify based on all of the content combinations that we could not know of in a classification system. |
Batching/Super Rules | To combine one or more rule types in sequential logic to produce controlled vocabulary or simple classification outcomes. | Controlled vocabularies that exist in taxonomies can be properly classified easily by batching together one or more of the above rule types to create a super rule. |
This blog post should be seen as a simple primer to both metadata and capabilities of the Compliance Guardian Classification Capabilities – it’s by no means a comprehensive listing of features or definitions.
For more information on Compliance Guardian and if you need help setting up your classification system, please contact AvePoint Sales. We have a group of world-renowned experts in this field who can work with you to develop your Taxonomy and Controlled Vocabularies as well as write your rules and actions. It is obvious that data is important, but metadata is even more important. Everyone can benefit from learning more about metadata, and every company can benefit in some way by implementing a policy as related to customer service, employee education, or as part of a DLP effort. Please feel free to contact me as well if you have any questions related to this blog post!