Metadata is information that describes, classifies or identifies a piece of information. Metadata is typically described as a set of attributes that help to describe or classify an object. For example, here are just some of the metadata attributes for me!
In an information management system like SharePoint, these properties are defined as fields that can store values against a particular record. For example, in my case the field “Gender” would store the value “Male” and the “Name” field would store the value “Christopher Woodill”.
These fields can be used to find, filter, and prioritize information in a variety of different ways:
- If a field is unique to each record, I can use it to find a specific record. For example, your social insurance number, passport number, employee ID, etc. are commonly used for this purpose.
- If fields are standardized, I can use it for browsing and filtering large lists. For example, filtering a list of people where gender=male would provide me only the men in my list of profiles.
- Each field could be searched itself for important information. For example, the biography attribute may contain 5000 words of information describing my professional background, my accomplishments, etc. This information can be indexed so that when people do searches they find references contained within the field.
Another Example: Email
Let’s look at another example. Here are some of the metadata attributes attached to any email:
Each of these fields can be used to filter, sort or search the list of emails – we do this typically in a program such as Microsoft Outlook.
Metadata is Standardized
One of the key benefits of using metadata is standardization. For example, everyone around the world has agreed that an email has to contain the attributes above. In addition, there are rules around each field that prescribe valid information for that field – for example, the “Date Sent” field cannot be the value, “Blue” because this is clearly not a valid date. Most email clients will enforce these rules when you compose a message.
The benefit to this approach is that when I describe the concept “Email”, you as a user and your email client as a system understands what we’re talking about in a standard way. It means that we can exchange emails consistently because everyone uses the same standard.
Advanced Ideas on Metadata
Here are a few other ideas around metadata that are more advanced but have direct impact on how you can understand classification of information.
Imagine you are classifying a product in a catalogue. Your catalogue is published in English, French and German. The product list has a metadata attribute for colour (or “color”) and you pick “blue” from a standard list of colours. However, when you publish the catalogue in another language, you will need to provide a localized version of that attribute. The key concept here though is that Blue, Bleu, Blau are not different values – they are all the same colour. Instead, they are simply labels translated into their respective languages.
Imagine you are classifying restaurants by location. You assign a restaurant the value, “Denver” for the location. However, Denver actually lives within a hierarchy. By selecting “Denver”, you have also implicitly assigned it to the state “Colorado” and the country “United States”. By creating metadata fields into a structured hierarchy, we can use this to our advantage when we classify because the rules are in place to understand the relationship between the different levels, e.g. in the case the relationship between Denver, Colorado and the United States.
Documents and Events are different things. The metadata to describe a document might include a title, create date, and author while an event might include a description, attendees, and a start date. But we could have fields are that are used to classify both documents and events. For example, in this scenario, we have added the attribute “Department” to classify both documents and events.
The key standardization strategy here is that if we can define “Department” centrally then we can re-use that definition across both documents and events. The benefit to this is that when we talk about, search for, or filter by “Department” we have a consistent definition across all our information. This helps me when I want to bring together search results for different types of information, e.g. in this case, please give me a list of all the documents AND all the events for assigned to the Human Resources department.