Interim Collection Comparison Report for Big Earth Data Initiative
Metadata Source: NASA Common Metadata Repository
Metadata Dialect: ISO 19115-2
Evaluation Target: People and Organizations
Ted Habermann – The HDF Group
Metadata serves an essential function in connecting users to people and organizations to help them access, use, and understand data. The ESDIS Common Metadata Repository (CMR) recognizes the importance of this type of information and includes two related elements in the Unified Metadata Model (UMM) Common Profile: Responsibility and Party. As elements in the Common Profile, these elements are included in all other UMM Profiles.
The Responsibility element broadly defines responsibilities related to data resources using the position of the element in the metadata model hierarchy. The UMM-Common Profile defines five responsibilities: Metadata Contact, Resource Author / Originator, Point of Contact, Distributor, and Processor. Each of these responsibilities can have multiple people or organizations (termed parties) associated with it. A RoleCode that is chosen from the standard ISO Codelist describes details of the roles of those parties.
Understanding usage, completeness and consistency of Responsibilities and Parties in ESDIS metadata is an important first step towards providing consistent and complete services to users of those data. This report provides an initial assessment of these characteristics.
Metadata Collection | Identifier | # Records |
Alaska Satellite Facility (ASF) | A | 32 |
Crustal Dynamics Data Information System (CDDIS) | B | 36 |
Global Hydrology Resource Center (GHRC) | C | 267 |
Goddard Space Flight Center (GSFC) Simple, Scalable, Script-based Science Processing Archive (S4PA) | D | 491 |
Level 1 and Atmosphere Archive and Distribution System (LAADS) | E | 53 |
Land, Atmosphere Near real-time Capability for EOS (LANCEMODIS) | F | 43 |
Langley Research Center (LARC) | G | 319 |
Langley Research Center (LARC) Atmospheric Science Data Center | H | 134 |
Land Process DAAC - EOS Core System (LPDAAC_ECS) | I | 164 |
National Snow and Ice Data Center Version 0 (NSIDCV0) | J | 280 |
National Snow and Ice Data Center EOS Core System (NSIDC_ECS) | K | 77 |
Ozone Monitoring Instrument - Near Real-Time | L | 1 |
Physical Oceanography DAAC (PODAAC) | M | 80 |
Socioeconomic Data and Applications Center (SEDAC) | N | 170 |
U.S. Geological Survey Earth Resources Observation Systems (USGS_EROS) | O | 11 |
Table 1. Metadata collections and number of records
We examined 2158 metadata records from 15 collections (Table 1) extracted from the CMR during October 2015. For each recommended responsibility we provide a table that gives the average number of occurrences for elements of the associated party in each of these collections. A value of 1 or more typically (although not necessarily) indicates that the element is included one or more times in each record in a collection. A value < 1.0 is typically the percentage of records in a collection that include the metadata element. Empty cells indicate values of 0 – the element is completely missing from the collection. Two other collections (Oak Ridge DAAC and Ocean Biology Processing Group) are missing from this interim report for technical reasons. Data for those collections will be included ASAP.
ISO path: /gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty
The Metadata Contact responsibility gives the party that is responsible for creating and maintaining the metadata. The xPath of the source of this information in the DIF and ECHO dialects is given in Table 2.
Table 2. Sources for Metadata Contact information
Dialect | Source |
DIF | /DIF/Personnel[Role=”DIF AUTHOR”] |
ECHO |
|
Table 3 shows the frequency of occurrence of elements of the Metadata Contact Responsibility / Parties for fifteen collections in the CMR. The data indicates that fourteen of the collections include the name of the organization that is responsible for the metadata (gmd:organisationName) and a roleCode (gmd:role/gmd:CI_RoleCode) for the organization. In general, no contact information is provided for these organizations although the Goddard Space Flight Center (GSFC) Simple, Scalable, Script-based Science Processing Archive (GSFCS4PA) collection includes more complete metadata contact information in roughly 90% of the records.
Table 3. Occurrences of Metadata Contact Responsibility
Number of Records | 84 | 28 | 35 | 339 | 687 | 78 | 5 | 116 | 343 | 495 | 185 | 197 | 63 | 4 | 1135 | 375 | 11 | |
Concept | Element Path | ASF | CDDIS | GES_DISC | GHRC | GSFCS4PA | LAADS | LANCEAMSR2 | LANCEMODIS | LARC | LARC_ASDC | LPDAAC_ECS | NSIDC_ECS | OB_DAAC | OMINRT | ORNL_DAAC | PODAAC | USGS_EROS |
gmd:organisationName | gmd:contact/gmd:organisationName | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | .34 | .45 | 1.00 | 1.00 | .93 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ||
gmd:CI_RoleCode | gmd:contact/gmd:role/gmd:CI_RoleCode | 1.00 | 1.00 | 1.00 | 2.01 | 1.00 | .34 | .45 | 1.00 | 1.00 | .93 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ||
gmd:individualName | gmd:contact/gmd:individualName | 1.01 | ||||||||||||||||
gmd:deliveryPoint | gmd:contact/gmd:contactInfo/gmd:address/gmd:deliveryPoint | 1.00 | ||||||||||||||||
gmd:city | gmd:contact/gmd:contactInfo/gmd:address/gmd:city | 1.00 | ||||||||||||||||
gmd:administrativeArea | gmd:contact/gmd:contactInfo/gmd:address/gmd:administrativeArea | 1.00 | ||||||||||||||||
gmd:postalCode | gmd:contact/gmd:contactInfo/gmd:address/gmd:postalCode | 1.00 | ||||||||||||||||
gmd:country | gmd:contact/gmd:contactInfo/gmd:address/gmd:country | 1.00 | ||||||||||||||||
gmd:voice | gmd:contact/gmd:contactInfo/gmd:phone/gmd:voice | 1.38 |
ISO path: /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:CI_Citation
/gmd:citedResponsibleParty/gmd:CI_ResponsibleParty/
The Resource Author / Originator Responsibility gives the party that is responsible for creating the dataset. This is typically the Principal Investigator for the project or their Institution. The xPath of the source of this information in the DIF and ECHO dialects is given in Table 4.
Table 4. Sources for Resource Author / Originator information.
Dialect | Source |
DIF |
|
ECHO |
|
Table 5 shows the frequency of occurrence of elements of the Resource Author / Originator Responsibility / Parties for fifteen collections in the CMR. The data indicates that seven of fifteen collections include some of this information for between 2 and 100% of their records. Most of the records that include any information include consistent contact information (identical numbers in all rows of Table 5).
Table 5. Occurrences of Resource Author / Originator Responsibility
Path Elements | ASF | CDDIS | GHRC | GSFCS4PA | LAADS | LANCEMODIS | LARC | LARC_ASDC | LPDAAC_ECS | NSIDCV0 | NSIDC_ECS | OMINRT | PODAAC | SEDAC | USGS_EROS |
gmd:contactInfo/ gmd:address/gmd:administrativeArea | .91 | .98 | 1.00 | .48 | .94 | .04 | 1.57 | ||||||||
gmd:contactInfo/ gmd:address/gmd:city | .91 | .98 | 1.00 | .48 | .94 | .04 | 1.57 | ||||||||
gmd:contactInfo/ gmd:address/gmd:country | .91 | .98 | 1.00 | .48 | .94 | .04 | 1.57 | ||||||||
gmd:contactInfo/ gmd:address/gmd:deliveryPoint | .91 | .98 | 1.00 | .48 | .94 | .04 | 1.57 | ||||||||
gmd:contactInfo/ gmd:address/gmd:electronicMailAddress | .95 | .98 | 1.00 | .48 | .94 | .03 | 1.68 | ||||||||
gmd:contactInfo/ gmd:address/gmd:postalCode | .91 | .98 | 1.00 | .48 | .94 | .04 | 1.57 | ||||||||
gmd:contactInfo/ gmd:contactInstructions | .98 | 1.00 | .48 | .94 | 1.47 | ||||||||||
gmd:contactInfo/ gmd:hoursOfService | .48 | 1.21 | |||||||||||||
gmd:contactInfo/ gmd:phone/gmd:facsimile | .15 | .05 | |||||||||||||
gmd:contactInfo/ gmd:phone/gmd:voice | 1.04 | .98 | 1.00 | .53 | .94 | .02 | 1.47 | ||||||||
gmd:individualName | 1.03 | .43 | .04 | 1.17 | |||||||||||
gmd:organisationName | .98 | 1.00 | .04 | .94 | .04 | .51 | |||||||||
gmd:positionName | .43 | .96 | |||||||||||||
gmd:role/gmd:CI_RoleCode | 1.03 | .98 | 1.00 | .48 | .94 | .04 | 1.68 |
ISO path: /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:pointOfContact/gmd:CI_ResponsibleParty/
The Point of Contact Responsibility is responsible for answering scientific questions about the dataset. Often this is a data manager at the archive that houses the dataset. The xPath of the source of this information in the DIF and ECHO dialects is given in Table 6.
Table 6. Sources for Point of Contact information.
Dialect | Source |
DIF |
|
ECHO |
|
Table 7 shows the frequency of occurrence of elements of the Point of Contact Responsibility / Parties for fifteen collections in the CMR. The data indicates that fourteen of fifteen collections include the name of the responsible organization in 100% of their records. In general, no contact information is provided for these organizations. The Goddard Space Flight Center (GSFC) Simple, Scalable, Script-based Science Processing Archive (GSFCS4PA) collection includes more complete metadata contact information in roughly 90% of the records.
Table 7. Occurrences of Point of Contact Responsibility
Path Elements | ASF | CDDIS | GHRC | GSFCS4PA | LAADS | LANCEMODIS | LARC | LARC_ASDC | LPDAAC_ECS | NSIDCV0 | NSIDC_ECS | OMINRT | PODAAC | SEDAC | USGS_EROS |
gmd:organisationName | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
gmd:role/gmd:CI_RoleCode | 1.00 | 1.00 | 1.00 | 1.57 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
gmd:contactInfo/ gmd:address/gmd:administrativeArea | .56 | ||||||||||||||
gmd:contactInfo/ gmd:address/gmd:city | .56 | ||||||||||||||
gmd:contactInfo/ gmd:address/gmd:country | .56 | ||||||||||||||
gmd:contactInfo/ gmd:address/gmd:deliveryPoint | .56 | ||||||||||||||
gmd:contactInfo/ gmd:address/gmd:electronicMailAddress | .57 | ||||||||||||||
gmd:contactInfo/ gmd:address/gmd:postalCode | .56 | ||||||||||||||
gmd:contactInfo/ gmd:phone/gmd:voice | .82 | ||||||||||||||
gmd:individualName | .57 |
ISO Path: /gmi:MI_Metadata/gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor
/gmd:distributorContact/gmd:CI_ResponsibleParty/
The Distribution Contact Responsibility contains the party responsible for answering questions about the distribution of the dataset. This is typically the Archive Center for the dataset. The xPath of the source of this information in the DIF and ECHO dialects is given in Table 8.
Table 8. Sources for Distribution Contact Responsibility information.
Dialect | Source |
DIF |
|
ECHO |
|
Table 9 shows the frequency of occurrence of elements of the Distribution Contact Responsibility / Parties for fifteen collections in the CMR. Twelve of the fifteen collections include the name of the responsible organization in almost 100% of their records. The amount of contact information for these organizations varies quite a bit. Most collections include email addresses for the distributors.
Table 9. Occurrences of the Distributor Contact Responsibility
Path Elements | ASF | CDDIS | GHRC | GSFCS4PA | LAADS | LANCEMODIS | LARC | LARC_ASDC | LPDAAC_ECS | NSIDCV0 | NSIDC_ECS | OMINRT | PODAAC | SEDAC | USGS_EROS |
gmd:distributorContact/ gmd:contactInfo/ gmd:address/gmd:administrativeArea | 1.00 | 1.14 | .98 | 1.00 | 1.32 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |||||
gmd:distributorContact/ gmd:contactInfo/ gmd:address/gmd:city | 1.00 | 1.14 | .98 | 1.00 | 1.32 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |||||
gmd:distributorContact/ gmd:contactInfo/ gmd:address/gmd:country | 1.00 | 1.14 | .98 | 1.00 | 1.32 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |||||
gmd:distributorContact/ gmd:contactInfo/ gmd:address/gmd:deliveryPoint | 1.00 | 1.14 | .98 | 1.00 | 1.32 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |||||
gmd:distributorContact/ gmd:contactInfo/ gmd:address/ gmd:electronicMailAddress | 1.00 | 1.00 | 1.00 | 1.14 | 1.00 | 1.00 | 1.32 | 1.00 | 1.00 | 1.00 | .99 | 1.00 | 1.00 | ||
gmd:distributorContact/ gmd:contactInfo/ gmd:address/gmd:postalCode | 1.00 | 1.14 | .98 | 1.00 | 1.32 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |||||
gmd:distributorContact/ gmd:contactInfo/ gmd:contactInstructions | .98 | 1.00 | 1.32 | 1.00 | .97 | 1.00 | |||||||||
gmd:distributorContact/ gmd:contactInfo/ gmd:hoursOfService | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ||||||||||
gmd:distributorContact/ gmd:contactInfo/ gmd:phone/gmd:facsimile | 1.00 | 1.00 | 1.00 | ||||||||||||
gmd:distributorContact/ gmd:contactInfo/ gmd:phone/gmd:voice | 1.00 | 2.00 | 2.28 | 1.96 | 3.00 | 1.32 | 1.00 | 1.00 | 1.00 | .99 | 2.00 | 4.00 | |||
gmd:distributorContact/ gmd:individualName | 1.00 | 1.14 | .33 | .03 | .99 | 1.00 | |||||||||
gmd:distributorContact/ gmd:positionName | .33 | .03 | .99 | ||||||||||||
gmd:distributorContact/ gmd:role/gmd:CI_RoleCode | 1.00 | 1.00 | 1.00 | 1.14 | 1.00 | 1.00 | 1.32 | 2.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
gmd:distributorContact/ gmd:organisationName | 1.00 | .00 | 1.00 | 1.00 | 1.00 | 2.00 | 1.00 | 1.00 | .97 | 1.00 | 1.00 | 1.00 |
ISO path: /gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage
/gmd:processStep/gmd:LI_ProcessStep/gmd:processor
The Processor Responsibility gives the party that is responsible for processing the dataset. It is included in the lineage metadata. The xPath of the source of this information in the DIF and ECHO dialects is given in Table 10.
Table 10. Sources for Processor Responsibility information.
Dialect | Source |
DIF |
|
ECHO |
|
Table 11 shows the frequency of occurrence of elements of the Processor Responsibility / Parties for fifteen collections in the CMR. The data indicate that nine of the fifteen collections include the name of the responsible organization in over 50% of their records and that none of the collections include contact information for the processors.
Table 11. Occurrences of Processor Responsibility
Path Elements | ASF | CDDIS | GHRC | GSFCS4PA | LAADS | LANCEMODIS | LARC | LARC_ASDC | LPDAAC_ECS | NSIDCV0 | NSIDC_ECS | OMINRT | PODAAC | SEDAC | USGS_EROS |
gmd:processor /gmd:organisationName | .62 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | .99 | 1.00 | ||||||
gmd:processor/ gmd:role/gmd:CI_RoleCode | .62 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | .99 | 1.00 |
The data shown above clearly indicates that the completeness of contact information varies significantly across collections and responsibilities. The standards all include extensive physical contact information, e.g. cities, addresses, and postal codes. This reflects the prevalence of physical mail delivery when these standards were created. Now electronic delivery dominates, so e-mail addresses are more likely to be helpful than physical addresses. We examined the occurrence of e-mail addresses for responsibilities in all 15 collections. Table 12 gives the results of this analysis. The data indicate that thirteen of fifteen collections include e-mail addresses for the Distributor Responsibility but that most collections are missing e-mail for other responsibilities.
Table 12. Completeness of contact email addresses
Path Elements | ASF | CDDIS | GHRC | GSFCS4PA | LAADS | LANCEMODIS | LARC | LARC_ASDC | LPDAAC_ECS | NSIDCV0 | NSIDC_ECS | OMINRT | PODAAC | SEDAC | USGS_EROS |
Metadata Contact | .91 | ||||||||||||||
Distribution Contact | 1.00 | 1.00 | 1.00 | 1.14 | 1.00 | 1.00 | 1.32 | 1.00 | 1.00 | 1.00 | .99 | 1.00 | 1.00 | ||
Resource Author / Originator | .95 | .98 | 1.00 | .48 | .94 | .03 | 1.68 | ||||||||
Point of Contact | .57 | ||||||||||||||
Processor |