You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 25 Next »

Interim Collection Comparison Report for Big Earth Data Initiative

Metadata Source: NASA Common Metadata Repository

Metadata Dialect: ISO 19115-2

Evaluation Target: People and Organizations

Ted Habermann – The HDF Group 

Metadata serves an essential function in connecting users to people and organizations to help them access, use, and understand data. The ESDIS Common Metadata Repository (CMR) recognizes the importance of this type of information and includes two related elements in the Unified Metadata Model (UMM) Common Profile: Responsibility and Party. As elements in the Common Profile, these elements are included in all other UMM Profiles.

The Responsibility element broadly defines responsibilities related to data resources using the position of the element in the metadata model hierarchy. The UMM-Common Profile defines five responsibilities: Metadata Contact, Resource Author / Originator, Point of Contact, Distributor, and Processor. Each of these responsibilities can have multiple people or organizations (termed parties) associated with it. A RoleCode that is chosen from the standard ISO Codelist describes details of the roles of those parties.

Understanding usage, completeness and consistency of Responsibilities and Parties in ESDIS metadata is an important first step towards providing consistent and complete services to users of those data. This report provides an initial assessment of these characteristics.

We examined 4180 metadata records from 17 collections extracted from the CMR during March 2015 and 2126 metadata records from 15 collections extracted from the CMR during October 2015 (Collections Examined). For each recommended responsibility we provide a table that gives the average number of occurrences for elements of the associated party in each of these collections. A value of 1 or more typically (although not necessarily) indicates that the element is included one or more times in each record in a collection. A value < 1.0 is typically the percentage of records in a collection that include the metadata element. Empty cells indicate values of 0 – the element is completely missing from the collection.

Metadata Contact:

ISO path: /gmi:MI_Metadata/gmd:contact/gmd:CI_ResponsibleParty

The Metadata Contact responsibility gives the party that is responsible for creating and maintaining the metadata. The xPath of the source of this information in the DIF and ECHO dialects is given in Table 1. 

Table 1. Sources for Metadata Contact information

Dialect

Source

DIF

/DIF/Personnel[Role=”DIF AUTHOR”]

ECHO

  1. /*/ArchiveCenter
  2. /*/Contacts/Contact[contains(Role,'DIF AUTHOR')]

 

Table 2 shows the frequency of occurrence of elements of the Metadata Contact Responsibility / Parties for seventeen collections in the CMR. The data indicates that sixteen of the collections include the name of the organization that is responsible for the metadata (gmd:organisationName) and a roleCode (gmd:role/gmd:CI_RoleCode) for the organization. In general, no contact information is provided for these organizations although the Goddard Space Flight Center (GSFC) Simple, Scalable, Script-based Science Processing Archive (GSFCS4PA) collection includes more complete metadata contact information in their records.

This difference reflects the source of the information in ECHO. Metadata Contacts that originate as ArchiveCenter have only the organisationName while those that originate as contacts generally have more complete information.

 

Table 2. Occurrences of Metadata Contact Responsibility

 Number of Records842835339687785116343495185197634113537511
ConceptElement PathASFCDDISGES_DISCGHRCGSFCS4PALAADSLANCEAMSR2LANCEMODISLARCLARC_ASDCLPDAAC_ECSNSIDC_ECSOB_DAACOMINRTORNL_DAACPODAACUSGS_EROS
gmd:organisationNamegmd:contact/gmd:organisationName1.001.00 1.001.00 1.00.34.451.001.00.931.001.001.001.001.00
gmd:CI_RoleCodegmd:contact/gmd:role/gmd:CI_RoleCode1.001.00 1.002.01 1.00.34.451.001.00.931.001.001.001.001.00
gmd:individualNamegmd:contact/gmd:individualName    1.01            
gmd:deliveryPointgmd:contact/gmd:contactInfo/gmd:address/gmd:deliveryPoint    1.00            
gmd:citygmd:contact/gmd:contactInfo/gmd:address/gmd:city    1.00            
gmd:administrativeAreagmd:contact/gmd:contactInfo/gmd:address/gmd:administrativeArea    1.00            
gmd:postalCodegmd:contact/gmd:contactInfo/gmd:address/gmd:postalCode    1.00            
gmd:countrygmd:contact/gmd:contactInfo/gmd:address/gmd:country    1.00            
gmd:voicegmd:contact/gmd:contactInfo/gmd:phone/gmd:voice    1.38            

Resource Author / Originator:

ISO path: /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:CI_Citation

/gmd:citedResponsibleParty/gmd:CI_ResponsibleParty/

 

The Resource Author / Originator Responsibility gives the party that is responsible for creating the dataset. This is typically the Principal Investigator for the project or their Institution. The xPath of the source of this information in the DIF and ECHO dialects is given in Table 4. 

 

Table 4. Sources for Resource Author / Originator information.

Dialect

Source

DIF

  1. /DIF/Organization
  2. /DIF/Personnel[contains(Role,'Investigator')]
  3. /DIF/Data_Set_Citation/Dataset_Creator

ECHO

  1. /*/Contacts/Contact[contains(Role,'Data Originator')]
  2. /*/Contacts/Contact[contains(Role,'Producer')]
  3. /*/Contacts/Contact[contains(Role,'Investigator')]
  4. /*/Contacts/Contact[contains(Role,'INVESTIGATOR')]

 

Table 4 shows the frequency of occurrence of elements of the Resource Author / Originator Responsibility / Parties for seventeen collections in the CMR. The data indicates that five of seventeen collections include some of this information.

 

Table 4. Occurrences of Resource Author / Originator Responsibility

 Number of Records842835339687785116343495185197634113537511
ConceptElement PathASFCDDISGES_DISCGHRCGSFCS4PALAADSLANCEAMSR2LANCEMODISLARCLARC_ASDCLPDAAC_ECSNSIDC_ECSOB_DAACOMINRTORNL_DAACPODAACUSGS_EROS
Phonegmd:identificationInfo/gmd:citation/gmd:citedResponsibleParty/gmd:contactInfo/gmd:phone/gmd:voice    .86  .34.49 .84.94     
Delivery Pointgmd:identificationInfo/gmd:citation/gmd:citedResponsibleParty/gmd:contactInfo/gmd:address/gmd:deliveryPoint    .86  .34.46 .841.39     
Addressgmd:identificationInfo/gmd:citation/gmd:citedResponsibleParty/gmd:contactInfo/gmd:address/gmd:city    .86  .34.46 .841.39     
Administrative Areagmd:identificationInfo/gmd:citation/gmd:citedResponsibleParty/gmd:contactInfo/gmd:address/gmd:administrativeArea    .86  .34.46 .841.39     
Postal Codegmd:identificationInfo/gmd:citation/gmd:citedResponsibleParty/gmd:contactInfo/gmd:address/gmd:postalCode    .86  .34.46 .841.39     
Countrygmd:identificationInfo/gmd:citation/gmd:citedResponsibleParty/gmd:contactInfo/gmd:address/gmd:country    .86  .34.46 .841.39     
Role Codegmd:identificationInfo/gmd:citation/gmd:citedResponsibleParty/gmd:role/gmd:CI_RoleCode    .97  .34.46 .841.49     
Organization Namegmd:identificationInfo/gmd:citation/gmd:citedResponsibleParty/gmd:organisationName       .34.04 .84.30     
Contact Instructionsgmd:identificationInfo/gmd:citation/gmd:citedResponsibleParty/gmd:contactInfo/gmd:contactInstructions       .34.46 .84.93     
Individual Namegmd:identificationInfo/gmd:citation/gmd:citedResponsibleParty/gmd:individualName    .97   .42  1.18     
Position Namegmd:identificationInfo/gmd:citation/gmd:citedResponsibleParty/gmd:positionName        .42  .64     
Hoursgmd:identificationInfo/gmd:citation/gmd:citedResponsibleParty/gmd:contactInfo/gmd:hoursOfService        .46  .80     
Facsimilegmd:identificationInfo/gmd:citation/gmd:citedResponsibleParty/gmd:contactInfo/gmd:phone/gmd:facsimile        .12  .02     

Point of Contact:

ISO path: /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:pointOfContact/gmd:CI_ResponsibleParty/

 

The Point of Contact Responsibility is responsible for answering scientific questions about the dataset. Often this is a data manager at the archive that houses the dataset. The xPath of the source of this information in the DIF and ECHO dialects is given in Table 5.

 

Table 5. Sources for Point of Contact information.

Dialect

Source

DIF

  1. /DIF/Organization
  2. /DIF/Personnel[contains(Role,’Technical Contact’)]

ECHO

  1. /*/ArchiveCenter
  2. /*/Contacts/Contact[contains(Role,'TECHNICAL CONTACT')]

 

Table 6 shows the frequency of occurrence of elements of the Point of Contact Responsibility / Parties for seventeen collections in the CMR. The data indicates that fifteen of the collections include the name of the responsible organization in 100% of their records. In general, no contact information is provided for these organizations. The Goddard Space Flight Center (GSFC) Simple, Scalable, Script-based Science Processing Archive (GSFCS4PA) collection includes more complete metadata contact information in most of the records. 

This difference reflects the source of the information in ECHO. Metadata Contacts that originate as ArchiveCenter have only the organisationName while those that originate as contacts generally have more complete information.

Table 6. Occurrences of Point of Contact Responsibility

 Number of Records842835339687785116343495185197634113537511
ConceptElement PathASFCDDISGES_DISCGHRCGSFCS4PALAADSLANCEAMSR2LANCEMODISLARCLARC_ASDCLPDAAC_ECSNSIDC_ECSOB_DAACOMINRTORNL_DAACPODAACUSGS_EROS
Organization Namegmd:identificationInfo/gmd:pointOfContact/gmd:organisationName1.001.00 1.001.00 1.00.34.451.001.00.931.001.001.001.001.00
Role Codegmd:identificationInfo/gmd:pointOfContact/gmd:role/gmd:CI_RoleCode1.001.00 1.001.63 1.00.34.451.001.00.931.001.001.001.001.00
Individual Namegmd:identificationInfo/gmd:pointOfContact/gmd:individualName    .63            
Voicegmd:identificationInfo/gmd:pointOfContact/gmd:contactInfo/gmd:phone/gmd:voice    .86            
Delivery Pointgmd:identificationInfo/gmd:pointOfContact/gmd:contactInfo/gmd:address/gmd:deliveryPoint    .63            
Citygmd:identificationInfo/gmd:pointOfContact/gmd:contactInfo/gmd:address/gmd:city    .63            
Addressgmd:identificationInfo/gmd:pointOfContact/gmd:contactInfo/gmd:address/gmd:administrativeArea    .63            
Postal Codegmd:identificationInfo/gmd:pointOfContact/gmd:contactInfo/gmd:address/gmd:postalCode    .63            
Countrygmd:identificationInfo/gmd:pointOfContact/gmd:contactInfo/gmd:address/gmd:country    .63            

Distribution Contact:

ISO Path: /gmi:MI_Metadata/gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor

/gmd:distributorContact/gmd:CI_ResponsibleParty/

 

The Distribution Contact Responsibility contains the party responsible for answering questions about the distribution of the dataset. This is typically the Archive Center for the dataset. The xPath of the source of this information in the DIF and ECHO dialects is given in Table 7. 

 

Table 7. Sources for Distribution Contact Responsibility information.

Dialect

Source

DIF

  1. /DIF/Data_Center/Data_Center_Name

ECHO

  1. /*/ArchiveCenter
  2. /*/Contacts/Contact[contains(Role,'Archive')]
  3. /*/Contacts/Contact[contains(Role,’DATA CENTER CONTACT’)]
  4. /*/Contacts/Contact[contains(Role,'Distributor')]
  5. /*/Contacts/Contact[contains(Role,'User Services')]
  6. /*/Contacts/Contact[contains(Role,'GHRC USER SERVICES')]
  7. /*/Contacts/Contact[contains(Role,'ORNL DAAC User Services')]

 

Table 8 shows the frequency of occurrence of elements of the Distribution Contact Responsibility / Parties for seventeen collections in the CMR. Eleven of the seventeen collections include the name of the responsible organization in almost 100% of their records. The amount of contact information for these organizations varies quite a bit.

 

Table 8. Occurrences of the Distributor Contact Responsibility

 Number of Records842835339687785116343495185197634113537511
ConceptElement PathASFCDDISGES_DISCGHRCGSFCS4PALAADSLANCEAMSR2LANCEMODISLARCLARC_ASDCLPDAAC_ECSNSIDC_ECSOB_DAACOMINRTORNL_DAACPODAACUSGS_EROS
gmd:CI_RoleCodegmd:distributionInfo/gmd:distributor/gmd:distributorContact/gmd:role/gmd:CI_RoleCode1.001.001.001.001.111.001.001.001.352.001.001.001.001.001.001.001.00
gmd:voicegmd:distributionInfo/gmd:distributor/gmd:distributorContact/gmd:contactInfo/gmd:phone/gmd:voice1.002.00  2.21  1.03.791.001.00.93  2.001.004.00
gmd:organisationNamegmd:distributionInfo/gmd:distributor/gmd:distributorContact/gmd:organisationName 1.001.00    .661.002.001.00.991.001.00 1.001.00
gmd:deliveryPointgmd:distributionInfo/gmd:distributor/gmd:distributorContact/gmd:contactInfo/gmd:address/gmd:deliveryPoint 1.00  1.11  .34.791.001.00.93  1.00 1.00
gmd:citygmd:distributionInfo/gmd:distributor/gmd:distributorContact/gmd:contactInfo/gmd:address/gmd:city 1.00  1.11  .34.791.001.00.93  1.00 1.00
gmd:administrativeAreagmd:distributionInfo/gmd:distributor/gmd:distributorContact/gmd:contactInfo/gmd:address/gmd:administrativeArea 1.00  1.11  .34.791.001.00.93  1.00 1.00
gmd:postalCodegmd:distributionInfo/gmd:distributor/gmd:distributorContact/gmd:contactInfo/gmd:address/gmd:postalCode 1.00  1.11  .34.791.001.00.93  1.00 1.00
gmd:countrygmd:distributionInfo/gmd:distributor/gmd:distributorContact/gmd:contactInfo/gmd:address/gmd:country 1.00  1.11  .34.791.001.00.93  1.00 1.00
gmd:individualNamegmd:distributionInfo/gmd:distributor/gmd:distributorContact/gmd:individualName 1.00  1.11   .35  .01  1.001.00 
gmd:contactInstructionsgmd:distributionInfo/gmd:distributor/gmd:distributorContact/gmd:contactInfo/gmd:contactInstructions       .34.79 1.00.92    1.00
gmd:hoursOfServicegmd:distributionInfo/gmd:distributor/gmd:distributorContact/gmd:contactInfo/gmd:hoursOfService        .78 1.00.93    1.00
gmd:positionNamegmd:distributionInfo/gmd:distributor/gmd:distributorContact/gmd:positionName        .35  .01   1.00 
gmd:facsimilegmd:distributionInfo/gmd:distributor/gmd:distributorContact/gmd:contactInfo/gmd:phone/gmd:facsimile        .78 1.00.93     
gmd:electronicMailAddressgmd:distributionInfo/gmd:distributor/gmd:distributorContact/gmd:contactInfo/gmd:address/gmd:electronicMailAddress     .21 .05         

Processor

ISO path: /gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage

/gmd:processStep/gmd:LI_ProcessStep/gmd:processor

 

The Processor Responsibility gives the party that is responsible for processing the dataset. It is included in the lineage metadata.  The xPath of the source of this information in the DIF and ECHO dialects is given in Table 9. 

 

Table 9. Sources for Processor Responsibility information.

Dialect

Source

DIF

  1. TBD

ECHO

  1. /*/ProcessingCenter

 

Table 10 shows the frequency of occurrence of elements of the Processor Responsibility / Parties for seventeen collections in the CMR. The data indicate that ten of the seventeen collections include the name of the responsible organization in most of their records and that none of the collections include contact information for the processors. 

 

Table 10. Occurrences of Processor Responsibility

 Number of Records842835339687785116343495185197634113537511
ConceptElement PathASFCDDISGES_DISCGHRCGSFCS4PALAADSLANCEAMSR2LANCEMODISLARCLARC_ASDCLPDAAC_ECSNSIDC_ECSOB_DAACOMINRTORNL_DAACPODAACUSGS_EROS
gmd:CI_RoleCodegmd:dataQualityInfo/gmd:lineage/gmd:processStep/gmd:processor/gmd:role/gmd:CI_RoleCode1.00 1.00 .571.00 1.001.00 1.001.00  1.001.001.00
gmd:organisationNamegmd:dataQualityInfo/gmd:lineage/gmd:processStep/gmd:processor/gmd:organisationName1.00   .571.00 .69.45 1.001.00  1.001.001.00

E-Mail Addresses

The data shown above clearly indicates that the completeness of contact information varies significantly across collections and responsibilities. The standards all include extensive physical contact information, e.g. cities, addresses, and postal codes. This reflects the prevalence of physical mail delivery when these standards were created. Now electronic delivery dominates, so e-mail addresses are more likely to be helpful than physical addresses. We examined the occurrence of e-mail addresses for responsibilities in all 15 collections. Table 12 gives the results of this analysis. The data indicate that thirteen of fifteen collections include e-mail addresses for the Distributor Responsibility but that most collections are missing e-mail for other responsibilities.

 

Table 12. Completeness of contact email addresses

Path Elements

ASF

CDDIS

GHRC

GSFCS4PA

LAADS

LANCEMODIS

LARC

LARC_ASDC

LPDAAC_ECS

NSIDCV0

NSIDC_ECS

OMINRT

PODAAC

SEDAC

USGS_EROS

Metadata Contact

   

.91

           

Distribution Contact

1.00

1.00

1.00

1.14

1.00

1.00

1.32

1.00

1.00

 

1.00

 

.99

1.00

1.00

Resource Author / Originator

   

.95

.98

1.00

.48

 

.94

.03

1.68

    

Point of Contact

   

.57

           

Processor

               

 

 

 

  • No labels