Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Overview
This best practices document describes how to use Machine Learning keywords in collection and service records in the Common Metadata Repository (CMR) to improve metadata and discoverability of machine learning data and models.
Machine Learning Modelkeywords are a representation of predictive models that, when trained on a set of data containing certain features, enables a computer to identify similar features in other data. Machine Learning Training Data keywords are a representation of input data necessary for running a machine learning model.
Best Practices
Science keywords from the GCMD Keyword Management System (KMS) are important for the precise search and retrieval of data and should accurately represent the dataset being described. At a minimum, one science keyword hierarchy must be provided, and this hierarchy must go down to the 'Term' level of detail.
- The Earth Science keywords should be picked from the GCMD KMS. The list of keywords can be found at https://gcmd.earthdata.nasa.gov/KeywordViewer/scheme/Earth Science
- The Machine Learning Training Data keywords should be picked from the GCMD KMS. The list of keywords can be found at https://gcmd.earthdata.nasa.gov/KeywordViewer/scheme/Earth%20Science%20Services/676bb4aa-1452-4527-aa05-83233ad5d01d
- The Machine Learning Model keywords should be picked from the GCMD Keyword Management System. The list of keywords can be found at https://gcmd.earthdata.nasa.gov/KeywordViewer/scheme/Earth%20Science%20Services/fe4392b0-13a9-43ff-bacc-f44a65aed4fa
Machine Learning Training Data (Model Training Data):
- Describe your machine learning training data as a UMM Collections (UMM-C) compliant collection record in CMR
- Add relevant 'Machine Learning Training Data' keywords describing the input data necessary for running a machine learning model
- Add relevant 'Earth Science' keywords describing what is being measured as part of the training data set
- Add the CMR Tag as 'machine.learning' (Note:
- To add a tag to a collection record, see the Search API Documentation
- Groups could apply the CMR tag to their records if given the proper permissions and approvals by ESDIS.
Machine Learning Models (Model):
- Describe your machine learning model as a UMM Services (UMM-S) compliant record in CMR
- Add relevant 'Machine Learning Model' keywords describing the type of model that was used to train the data
- Create a collection association to relevant Machine Learning Training Data collections in the CMR.
- To associate a service with one or more collections using the Metadata Management Tool (MMT), see the MMT User Guide
- To associate a service with one or more collections using the CMR API, see the CMR Search API Documentation