Machine Learning is its Own Language: a Common Tool for the ML Community
Abstract
The Planetary Data System (PDS) archives data across several scientific disciplines. PDS provides sub-models (formerly known as data dictionaries, and later namespaces) that allow data providers to describe special properties of their data, such as image, cartographic, or spectral properties, to prospective users. A Machine Learning Analysis sub-model was developed to support the description and classification of Machine Learning derived products.
The Machine Learning sub-model defines classes and attributes used to describe products generated by machine learning methods applied to PDS data. This information provides traceability for product provenance and credit to the creators. (https://github.com/pds-data-dictionaries/ldd-ml) The sub-model was created by a team in the PDS Imaging Node, who consulted with subject matter experts in Machine Learning at the Jet Propulsion Laboratory. The domain experts supplied the team with publications that documented the process involved in creating machine learning models, and the keyword choice in the Machine Learning sub-model is influenced by these previous efforts, including the utilization of model cards and model provenance. A Machine Learning sub-model allows for Machine Learning products to adhere to the FAIR Guiding Principles for scientific data management and stewardship (https://www.nature.com/articles/sdata201618). FAIR principles strive for greater findability, accessibility, interoperability, and reuse of scientific data. This provides the Machine Learning community with support for long term archival health, but also lays the path for greater reach and cross collaboration efforts across the Machine Learning community and beyond. With a common language comes tools that empower reproducibility, reuse, and discovery of Machine Learning scholarship.- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2022
- Bibcode:
- 2022AGUFMIN22D0333B