The Role of Controlled Vocabularies in Digital Archiving
Abstract
Over the years, and across projects and disciplines, there is an unfortunate tendency for descriptive terminology to wander. Some of the variation is due to evolution in sensor technology, but some may be due to odd abbreviations, typographical errors on rolling decks, institutional practices, or a momentary inspiration to use a new term. As a consequence, we now face challenges in searching digital collections, and in designing re-usable tools that can be applied to multiple institutions. Practical experience with the SIOExplorer Digital Library of 700 SIO cruises has allowed us to develop techniques to assess variations in metadata values across collections of more than 100,000 digital objects, including datasets, documents and images spanning more than 50 years. The assessment helps to guide the development of controlled vocabularies, which in turn can be used to enable automatic detection of metadata errors, and in some cases automatic correction. Controlled vocabularies are playing an essential role in extending the technology to the collections of the Woods Hole Oceanographic Institution, including cruises, Alvin dives and ROV operations. Examples include the names of chief scientists, port names, operational areas, science themes, image types, sample types, data types, and processing steps. Controlled vocabularies underlie an emerging set of tools that support web user interfaces, large-scale automatic harvesting of metadata and data, project status assessment, workflow management and overall quality control. They are a key resource for user upload code in the IODP Site Survey Data Bank, prompting and enforcing appropriate metadata values for ocean drilling proposal support data. Compared to previous generations of hard-wired code, the access to controlled vocabularies allows a project to evolve with flexibility, and the code to be ported from one project to another. These efforts are supported by a Digital Archiving award from the Library of Congress and NSF (CISE/IIS 0455998), as well as from the IODP-MI Site Survey Data Bank subcontract from NSF (OCE 0432224).
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2006
- Bibcode:
- 2006AGUFMIN51A0807M
- Keywords:
-
- 1724 Ocean sciences;
- 3000 MARINE GEOLOGY AND GEOPHYSICS