Extending schema.org for Science Data Discovery
Abstract
Structured data mark-ups help search engines understand the content of web pages. Well-established mark-up schema for things like events, organizations, people, and recipes have been enormously successful in improving the relevance of search results. Structured data help search engines understand, for example, that a search for recipes that use basil should prioritize results for food recipes in which basil is an ingredient over, say, a Wikipedia article about basil plants, or an article about how to grow basil in your garden. There is currently momentum for developing schema and vocabularies for structured data mark-ups to improve internet searches for science data.
Schema.org is a community-driven organization founded by Google, Microsoft, Yahoo and Yandex that has developed schemas and vocabularies for marking up internet pages. Schema.org vocabularies include terms for describing objects, events, and the relationships among them. Schema.org was developed with the intention of providing a common core of terms that could be extended by communities-of-interest according to their own requirements. Shared vocabularies developed by communities-of-interest from the schema.org core can help ensure that when an organization annotates its web pages using schema.org guidelines and vocabulary developed by its own community-of-interest, its content will be understood by search engines and will be prioritized appropriately in search results. Recently, major search engines including Google and Bing have committed to supporting search for science data leveraging schema.org mark-ups. A minimal vocabulary for describing datasets has already been developed. However, more is needed to enable mark-ups that are rich enough to help guide users to highly relevant search results. The Earth Science Information Partners Semantic Technologies Committee is launching an effort to develop, manage, and socialize community-based extensions to schema.org vocabularies in order to create rich, robust, and semantically grounded mark-ups for dataset landing pages and other science data-related web sites. This presentation will discuss the current status of schema.org mark-ups for datasets, the benefits of semantic enhancements to schema.org using JSON-LD, and a roadmap for community-driven extensions driven by ESIP.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2018
- Bibcode:
- 2018AGUFMIN31B..19H
- Keywords:
-
- 1904 Community standards;
- INFORMATICSDE: 1908 Cyberinfrastructure;
- INFORMATICSDE: 1946 Metadata;
- INFORMATICSDE: 1970 Semantic web and semantic integration;
- INFORMATICS;
- 1904 Community standards;
- INFORMATICSDE: 1908 Cyberinfrastructure;
- INFORMATICSDE: 1946 Metadata;
- INFORMATICSDE: 1970 Semantic web and semantic integration;
- INFORMATICS