Data-Driven Community Building: Measuring and Improving Connectivity in Domain Repositories
Abstract
Domain repositories can be an integral part of extensive community support systems that extend from proposal planning and writing, through project initiation and implementation, data collection, management, and archive, to publication of results and access to data by other community members. These long-term relationships are reflected in multiple contributions (data, software, results, papers, ...) by community members and recognizing these contributions should be an important community-building best-practice for these repositories. Identifiers for people and organizations are critical for recognizing community members and, equally important, for making connections between them and all of the various objects in the research ecosystem. This Figure demonstrates connections that can made once identifiers are integrated into the research ecosystem. Most domain repositories provide DOIs for datasets in the repository. The metadata for those DOIs can include identifiers for some authors (ORCIDs) along with names of organizations they are affiliated with (affiliations). In practice, most authors in these metadata records do not have ORCIDs but, if they have an ORCID once, that ORCID can be spread across all of the datasets they have contributed to, increasing connectivity across the repository. Affiliations can also be spread across multiple contributions, with some caveats. If identifiers (i.e. RORs) exist and can be found for the affiliated organizations, they can be inserted into the metadata, again increasing connectivity. Many domain repositories maintain lists of research papers that have used data from the archive. Metadata for these papers also provide a potential source for identifiers and affiliations. These can also be harvested and spread across the repository, again improving connectivity. These ideas and techniques were applied to UNAVCO, a repository for data related to geodesy with a well-developed community with over 5000 archived datasets. The connectivity for the repository is below 10% for dataset contributors and 0% for RORs. Applying these techniques can increase the connectivity to 56% for contributors and 49% for RORs.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2021
- Bibcode:
- 2021AGUFMIN45E0489R