The CMR includes metadata that originate in three dialects: DIF, ECHO, and ISO. The largest portion of CMR collection records are in the SciOps collection, and are inserted into the CMR in DIF format. These are referred to here as the SciOps collection. The second largest portion of metadata for NASA collections originate from ECHO and are inserted into the CMR in ECHO format. These are referred to here as the NASA collection. A third group of metadata in the CMR originate from other agencies around the world and are generally either in DIF or possibly in ISO. These are referred to as the Other collection.
The DIF and ECHO dialects were originally developed to facilitate discovery of collections in the Global Change Master Directory or ECHO. The content of these “discovery” dialects is translated into the ISO dialects that are the eventual target for CMR. This translation is generally done without augmentation, so the content does not change very much.
Metadata providers have a choice about which metadata dialect(s) they use to submit metadata to the CMR. We compared the IDN and NASA collections to understand how this choice affects the metadata content.
The table below shows the collection included in this evaluation as well as the record count for each collection.
Table 1. Collections and record counts
Collection | Organization | Count |
| Collection | Organization | Count |
ASF | NASA | 161 |
| OMNIRT | NASA | 5 |
CDDIS | NASA | 38 |
| ORNL_DAAC | NASA | 1216 |
GES_DISC | NASA | 1044 |
| PODAAC | NASA | 603 |
GHRC | NASA | 361 |
| SEDAC | NASA | 202 |
LAADS | NASA | 130 |
| USGS_EROS | NASA | 11 |
LANCEAMSR2 | NASA | 6 |
| AU_AADC | IDN | 2559 |
LANCEMODIS | NASA | 154 |
| ESA | IDN | 103 |
LARC | NASA | 406 |
| EUMETSAT | IDN | 23 |
LARC_ASDC | NASA | 606 |
| ISRO | IDN | 19 |
LPDAAC_ECS | NASA | 285 |
| JAXA | IDN | 340 |
NSIDC_ECS | NASA | 223 |
| LM_FIRMS | IDN | 1 |
NSIDC_V0 | NASA | 784 |
| NOAA_NCEI | IDN | 5448 |
OB_DAAC | NASA | 132 |
| USGS_LTA | IDN | 130 |
This analysis compares item usage (elements and attributes) in the 18 NASA collection with the 8 non NASA (IDN) collections. This evaluation identifies items which exist in collections as well as items that are complete in collections. In order for an items to exist in a collection it must be present in at least 1 metadata record included in the collection. In order for an items to be complete in a collection it must be present in all metadata records included in the collection.
Item usage for NASA collections and IDN collections is shown below using bubble charts. The bubble chart interpretation graphic provides a schematic for interpreting the bubble plots.
Bubble Chart Interpretation
The CMR team are in the difficult position of trying to make a coherent and useful metadata repository by collecting metadata from many organizations and projects that have different goals and needs. This presents a challenge as the CMR evolves and new requirements emerge. Metadata managers need to account for content that is not provided by the metadata providers. At the current time, this is between three and five percent of the content.
The solution for both NASA, IDN, and SciOps provided collections and IDN provided collections is to add the string “Not provided” to expected fields that have no content. This clearly indicates that content is missing, except that tools that read the metadata or translate it must be aware of and consider this convention to get meaningful results. The tools we use for evaluating metadata completeness are agnostic to element and attribute values. Therefore the analysis presented below, which compares NASA Complete with IDN Complete does not include metadata fields with a value of 'Not Provided'. Below are the metadata fields with 'Not Provided' values that were excluded from NASA Complete vs IDN Complete evaluation
/gmi:acquisitionInformation/gmi:instrument/gmi:type
/gmd:identificationInfo/gmd:pointOfContact/gmd:organisationName
/gmd:identificationInfo/gmd:abstract
/gmd:identificationInfo/gmd:descriptiveKeywords/gmd:keyword
/gmd:identificationInfo/gmd:resourceConstraints/gmd:useLimitation
/gmd:contentInfo/gmd:processingLevelCode/gmd:code
/gmd:identificationInfo/gmd:pointOfContact/gmd:individualName
/gmi:acquisitionInformation/gmi:instrument/eos:sensor/eos:type
/gmd:contact/gmd:organisationName
/gmi:acquisitionInformation/gmi:platform/gmi:identifier/gmd:code
/gmd:contentInfo/gmd:dimension/gmd:otherProperty/gco:Record/eos:AdditionalAttributes/eos:AdditionalAttribute/eos:reference/eos:description
/gmd:identificationInfo/gmd:processingLevel/gmd:code
/gmi:acquisitionInformation/gmi:platform/gmi:description
/gmd:identificationInfo/gmd:status/gmd:MD_ProgressCode
Table 3 shows fields in the IDN Group with the 'Not provided' % of IDN records from each data provider that include that value. In seven cases these missing data flags make up over 50% of the content for fields and in two cases (processingLevelCodes) the values in all records are 'Not provided'.
Number of Records | 2559 | 1 | 58 | 130 | 5488 | 23 | 340 | 103 | ||
Paths - Provider | Count | AU_AADC | LM_FIRMS | EUMETSAT | USGS_LTA | NOAA_NCEI | ISRO | JAXA | ESA | Average |
/gmd:contentInfo/gmd:dimension/gmd:otherProperty/gco:Record/ eos:AdditionalAttributes/eos:AdditionalAttribute/eos:reference/eos:description | 8 | 100% | 100% | 45% | 100% | 92% | 91% | 100% | 100% | 91% |
/gmd:contentInfo/gmd:processingLevelCode/gmd:code | 8 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
/gmd:identificationInfo/gmd:processingLevel/gmd:code | 8 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% |
/gmi:acquisitionInformation/gmi:instrument/gmi:type | 8 | 33% | 100% | 78% | 82% | 69% | 100% | 96% | 97% | 82% |
/gmd:identificationInfo/gmd:citation/gmd:edition | 7 | 100% | 0% | 40% | 98% | 86% | 87% | 28% | 95% | 67% |
/gmd:identificationInfo/gmd:citation/gmd:identifier/gmd:version | 7 | 100% | 0% | 40% | 98% | 86% | 87% | 28% | 95% | 67% |
/gmi:acquisitionInformation/gmi:platform/gmi:description | 7 | 100% | 0% | 100% | 100% | 100% | 91% | 100% | 100% | 86% |
/gmd:identificationInfo/gmd:descriptiveKeywords/gmd:keyword | 6 | 45% | 0% | 60% | 76% | 31% | 0% | 6% | 30% | 31% |
/gmd:identificationInfo/gmd:status/@codeListValue | 6 | 1% | 0% | 76% | 13% | 45% | 0% | 99% | 43% | 35% |
/gmd:identificationInfo/gmd:status/gmd:MD_ProgressCode | 6 | 1% | 0% | 76% | 13% | 45% | 0% | 99% | 43% | 35% |
/gmi:acquisitionInformation/gmi:platform/gmi:identifier/gmd:code | 6 | 45% | 0% | 60% | 76% | 31% | 0% | 6% | 30% | 31% |
/gmd:identificationInfo/gmd:abstract | 1 | 0% | 100% | 0% | 0% | 0% | 0% | 0% | 0% | 13% |
Chart 1 below compares item completeness in NASA Collections with IDN collections. The X axis shows the number of NASA collections that are complete with respect to an item. The Y axis shows number of IDN collections that are complete with respect to an element The bubble size shows the number of items included in the 2 collections.
The large red bubble in the upper right corner of the plot shows items (elements and attributes) that are complete is all 18 NASA collections and in all 8 IDN collections. This bubble includes 27 items, as shown in the legend. The large blue bubble in the lower left corner of the plot shows the items that are complete in 1 NASA collection and in 0 IDN collections. This bubble includes 38 items, as shown in the legend. Click on the image to view the data for this chart.
Chart 1: NASA Complete vs IDN Complete
To access the data for this chart, click on the chart graphic above to view an interactive version in Google Sheets. The interactive version enables data identification for each bubble and includes a look up table for identifying the ISO elements associated with the bubble. To access and use the interactive version:
Twenty-seven items (elements and attributes) are complete in all NASA and all IDN Collections: To identify these items, click on the map above to access the spreadsheet and select the All NASA and All IDN filtered view from the toolbar.
Only one element is complete in all NASA collections and some IDN collections: gmd:identificationInfo/gmd:citation/gmd:edition. This reflects the fact that this element is used in the NASA collections to provide version information for resources. To identify this items, click on the map above to access the spreadsheet and select the All NASA and Some IDN filtered view from the toolbar.
Nineteen items (elements and attributes) are complete in all IDN collections and complete in a smaller number of NASA collections. These are the bubbles along the top of the chart. To identify this items, click on the map above to access the spreadsheet and select the All IDN and Some NASA filtered view from the toolbar.
Sixty-three elements are complete in no IDN Collections and some NASA Collections. These are the bubbles on the X axis in the chart. None are complete in more than nine NASA collections. Many of these elements are related to keyword thesaurus and additional attribute information. It is interesting to note that keyword thesaurus information is completely absent from the IDN collections, reflecting the fact that GCMD keywords are used in those collections, perhaps as a requirement of participation in the IDN. To identify this items, click on the map above to access the spreadsheet and select the Some NASA and No IDN filtered view from the toolbar.
Chart 2 below compares item existence (elements and attributes) in NASA Collections with IDN collections. The X axis shows the number of NASA collections that include an item. The Y axis shows number of IDN collections that include an item. The bubble size shows the number of items included in the 2 collections.
The large blue bubble in the upper right corner of the plot shows the items (elements and attributes) that exist is all 18 NASA collections and in all 8 IDN collections. This bubble includes 51 items, as shown in the legend. The large red bubble in the lower left corner of the plot shows items that exist is one NASA collection and in zero IDN collections. This bubble includes 44 items, as shown in the legend. The data for Chart 2 is accessible in Google Sheets. To view the data and bubble chart in Google Sheets click on the image.
Chart 2: NASA EXIST vs IDN Exists
To access the data for this chart, click on the chart graphic above to view an interactive version in Google Sheets. The interactive version enables data identification for each bubble and includes a look up table for identifying the ISO items associated with the bubble. To access and use the interactive version:
Fifty-one items (elements and attributes) exist in all NASA and all IDN Collections: To identify these items, click on the map above to access the spreadsheet and select the All NASA and All IDN filtered view from the toolbar.
Fifty-two elements exist in all IDN Collections and some NASA collections. These are the bubbles along the top of the chart. To identify these items, click on the map above to access the spreadsheet and select the All NASA and Some IDN filtered view from the toolbar.
Two-hundred and twenty eight items (elements and attributes) exist in Some NASA collections and no IDN collections. To identify these items, click on the map above to access the spreadsheet and select the Some NASA and No IDN filtered view from the toolbar.
To view just the elements, click here select the Elements or the Attributes filtered view from the toolbar.