What Is Metadata? How Does it Improve Search Capabilities?

What is it?

Metadata is, by definition, “data about data.” Metadata is information that describes electronically archived data about something: the date it was created, a short description, title. A book’s metadata appears on the title and copyright pages: the author, publisher, copyright date, first or subsequent printing, and Library of Congress categorization information. Your metadata includes your date of birth, residence address, schools attended, and more.

How is metadata used?

Metadata is used to distinguish one thing from another or to group similar things together. Using metadata has become very important in the “electronic age” because so much information exists in an electronic form, we need a way to categorize and describe it.

When you save a digital photo to your computer, there is already metadata attached to it. The metadata doesn’t show when you view the picture, but it is there when you open the picture file in photo editing software: the date the picture was taken, the lens aperture, resolution, flash mode, time of day. When you add a title and a description for the photo, you add more metadata. You’re “tagging” it.

Tags that you add to a blog post are also metadata. That handful of terms helps categorize the piece that you created. They describe what it’s about in a few well-chosen words.

Word processing software often records metadata for electronic documents, including the author, date created, and the date it was last modified.

Metadata adds meaning to your electronic information. It can also help you find what you are looking for. For example, if you are search the internet for green alternatives to plastic containers, “green alternative” and “plastic” are useful search terms. You probably should search for “environmentally-friendly” references as well. That’s one of the problems with the Internet; there can be several different keywords for the same concept.

How does metadata improve search capabilities?

Any time you add metadata to an electronic file, you make it easier to find later. If you have a list of 200 categories and you always refer to that list when you choose tags for your files, you can easily find everything classified within a particular category.

Electronic libraries allow you to combine categories, for example, Mexico AND rain forest AND medicinal plants, without having to look in more than one place. When you narrow your search with other metadata, finding exactly what you need is quick and easy.

Organizations with large electronic collections may use thousands of terms to represent concepts in their collections. These are referred to as controlled vocabularies, lists of 5,000-20,000 words (typically) and phrases that are consistently used to describe the content (the subject) of the information in the collection so that finding individual items later is much easier. Each article about medicinal plants is tagged with the same term from the controlled vocabulary–”medicinal plants”–so later, when a user searches for articles about medically beneficial plants, they are easily found.

Resources

Content Standard for Digital Geospatial Metadata (CSDGM), Vers. 2 (FGDC-STD-001-1998): the US Federal Metadata standard.

Introduction to Metadata, Version 3.0, edited by Mertha Baca.

The Dublin Core Metadata Initiative: metadata standards for documents.

Resource Description Framework (RDF): standards for the Semantic Web.

Understanding Metadata, NISO Press, National Information Standards Organization.