One must adequately and thoroughly describe a resource so someone in the future can not only locate it, but also decipher what it is. Metadata does this for us, not only for physical resources like books, but also for digital resources like digital images and research data. Metadata is descriptive information about a resource, a project, experiments, equipment, the creators, as well as information on their use and structure. It is commonly called “data about data.” It aids in long term access to a digital resource. But the simple explanation “data about data” does not adequately capture the importance and the breadth of metadata. For example, scientific data sets are being created at a rapid pace, but unless that data is described adequately (which varies by circumstance), that data set will be meaningless at a point a short time in the future.
Metadata for a resource is embodied in a metadata record. Each metadata record consists of multiple metadata elements, or specific statements describing one characteristic of the resource, such as Title, Author, Description, and Date. Metadata schemas are predefined sets of metadata elements that prescribe which types of information go in each element. Almost every discipline — science, social science, and humanities — have metadata schemas tailored to resources within those disciplines. These different communities have different needs, which require different metadata elements, which make up different metadata schemas. Because of the abundance of metadata schemas, we develop crosswalks, or translations from one metadata schema to another. The crosswalk explicitly outlines how elements from one metadata schema map to elements in the other metadata schema. Problems arise when there is a “many to one” or “one to many” situation between elements of two metadata schemas. Metadata schemas each have a different level of granularity, which is the level at which the resource is described. For example, for a scientific data set, one can use a metadata schema that just described the overall data set, its author, and title, or one can use a schema that allows for description of individual variables and sensors. The latter schema is more granular.
Examples of metadata schemas and possible corresponding applications are shown in the table below.
|EAD||Archives, Special Collections|
|Dublin Core||Digital Libraries, Digital Repositories|
|FGDC CSDGM||Geospatial Data|
|Darwin Core||Biological Sciences|
|VRA Core||Visual Resources|
A librarian should be familiar with common metadata schemas in the areas they work. I plan to become a data curation librarian. I see my job as advocacy and consulting for data management best practices, data curation, and open sharing and access of data sets. There are too many metadata schemas to know them all intimately, but I am familiar with the main schemas used. The Content Standard for Digital Geographic Metadata from the Federal Geographic Data Committee is a schema designed for geospatial data. Earth and atmospheric sciences disciplines (meteorology, archaeology, geology, geography, etc.) would be well described if a complete record of this schema were created. Another important schema is the Ecological Metadata Language, which is a schema designed for the ecological sciences. Another important schema in the areas of health, medicine, anatomy, and physiology is the Darwin Core.
Probably more important than being familiar with all the metadata schemas is knowing where to find more information if needed. There are resources available on the web to help with locating suitable metadata schemas. Examples of these resources include the Digital Curation Centre’s Disciplinary Metadata site. This site groups metadata by discipline and by resource type to help one located a suitable metadata schema. Another useful resource is Seeing Standards: A Visualization of the Metadata Universe, which was developed at Indiana University by Jenn Riley, who is now at the University of North Carolina at Chapel Hill Library.