- Author(s): user-b9cd4, Erich Reiter
Description of Problem
- Roles:
- pick lists include roles that apply to organizations and roles that apply to individuals
- role values are not the same for DIF and for ECHO
- current collection records in CMR have non-standardized values for roles (including different variations of the same word (ARCHIVER, archive, archive data center), different words for same concept (PRODUCER, PROCESSOR?)
- ECHO 10 records have Organization role built into field names (Processing Center, Archive Center) rather than as field values
- EDSC shows Processing Center and Archiving Center on Collection Information panel, but these are blank for DIF collections
- Contact Types:
- contact type lists associated with organizations and personnel are not standardized
- contact type lists associated with organizations and personnel do not include 'modern' values
- Normalization:
- organization and personnel information are repeated in multiple CMR collection records; in GCMD, they are normalized
(The normalization issues will be addressed later, in - ECSE-91Getting issue details... STATUS and - ECSE-99Getting issue details... STATUS )
Background
Recommendations
Roles
a. RECOMMENDATION 1: Distinguish between Roles for Organizations (call them Data Centers) and Roles for Personnel or Personnel Groups (call them Data Contacts) , i.e.,
Change ResponsibilityRoleEnum to:DataCenterRoleEnum and
DataContactRoleEnum
Current list in UMM-Common schema ResponsibilityRoleEnum mixes the three:
"enum": ["RESOURCEPROVIDER", "CUSTODIAN", "OWNER", "USER", "DISTRIBUTOR", "ORIGINATOR", "POINTOFCONTACT", "PRINCIPALINVESTIGATOR", "PROCESSOR", "PUBLISHER", "AUTHOR", "SPONSOR", "COAUTHOR", "COLLABORATOR", "EDITOR", "MEDIATOR", "RIGHTSHOLDER", "CONTRIBUTOR", "FUNDER", "STAKEHOLDER"] (NOTE: This corresponds to the CI_RoleCode code list in the UMM-Common document (Figure 3).
b. RECOMMENDATION 2: Assign the VALUES to DataCenterRoleEnum and DataContactRoleEnum as follows:i. For Organizations (Data Centers), use: Archiver , Processor, Distributor, and Originator
i.e., DataCenterRoleEnum = "enum": ["ARCHIVER", "DISTRIBUTOR", "ORIGINATOR", "PROCESSOR"]
(Note: GCMD Organization Type ENUM is:
< xs:enumeration value="DISTRIBUTOR"/>
< xs:enumeration value="ARCHIVER"/>
< xs:enumeration value="ORIGINATOR"/>
< xs:enumeration value="PROCESSOR"/>
- CMR-2706Getting issue details... STATUS Error because Organization was set to ARCHIVER.
ii. for Personnel (DataContact), use:
DataContactRoleEnum = "enum": ["DATA CENTER CONTACT", "TECHNICAL CONTACT", "SCIENCE CONTACT", "INVESTIGATOR","METADATA AUTHOR", "USER SERVICES", "SCIENCE SOFTWARE"]
NOTE: The GCMD Role ENUMs are at https://git.earthdata.nasa.gov/projects/EMFD/repos/dif-schemas/browse/10.x/UmmCommon.xsd
The GCMD Personnel Role ENUM is:
<xs:enumeration value="INVESTIGATOR"/>
< xs:enumeration value="INVESTIGATOR, TECHNICAL CONTACT"/>
<xs:enumeration value="METADATA AUTHOR"/>
< xs:enumeration value="METADATA AUTHOR, TECHNICAL CONTACT"/>
< xs:enumeration value="TECHNICAL CONTACT"/>
NOTE: The GCMD OrganizationPersonnelRole ENUM is < xs:enumeration value="DATA CENTER CONTACT"/>
- ECSE-98Getting issue details... STATUS
c. RECOMMENDATION 3: Map current role values in the CMR to the new proposed role values as follows:
Current distribution of Role Values
Organization Role mapping:
Current values:(may need to check with provider in some cases to confirm whether mapping is satisfactory).
Proposed value mapping:
UMM-C | DIF 9 | DIF 10 | ECHO 10 | ISO |
---|---|---|---|---|
ARCHIVER | ARCHIVER | ARCHIVER | ARCHIVER Archive archiving data center CUSTODIAN Data Manager internal data center | |
DISTRIBUTOR | DISTRIBUTOR | DISTRIBUTOR | ||
ORIGINATOR | ORIGINATOR | Data Originator author | ||
PROCESSOR | PROCESSOR | PROCESSOR Producer |
Personnel Role mapping:
Current values:
Proposed value mapping: (may need to check with provider in some cases to confirm whether mapping is appropriate).
UMM-C | CURRENT VALUE | |||
---|---|---|---|---|
"DATA CENTER CONTACT", | DATA CENTER CONTACT Primary Contact | |||
"TECHNICAL CONTACT", | TECHNICAL CONTACT Product Team Leader | |||
"SCIENCE CONTACT", | Technical Contact for Science GLAS Science Team Leader ICESAT Project Scientist | |||
"INVESTIGATOR", | INVESTIGATOR Associate Principal Investigator | |||
"METADATA AUTHOR", | DIF AUTHOR METADATA AUTHOR TECHNICAL CONTACT, DIF AUTHOR | |||
"USER SERVICES" | NSIDC USER Services User Services GHRC USER SERVICES | |||
"SCIENCE SOFTWARE" | Science Software Development Manager Deputy Science Software Development Manager Sea Ice Algorithms Snow Algorithms |
-
ECSE-75Getting issue details...
STATUS
-
UMMC-412Getting issue details...
STATUS
-
UMMC-435Getting issue details...
STATUS
d. RECOMMENDATION 4:
Map ECHO 10 Organization metadata to UMM-C as follows:
/Collection/ProcessingCenter=<value> maps to DataCenter/Role=PROCESSOR and DataCenter/ShortName = <shortname corresponding to Processing Center value) and DataCenter/LongName = <longname corresponding to Processing Center value)
/Collection/ArchiveCenter= <value> maps to DataCenter/Role=ARCHIVER and DataCenter/ShortName = <shortname corresponding to Archive Center value) and DataCenter/LongName = <longname corresponding to Archive Center value)
- CMR-600Getting issue details... STATUS
ISSUE: Write a ticket to determine how to map the <value> for Processing Center or Archive Center in the ECHO record to the corresponding Shortname and Longname (NEED A LIST OF ACTUAL VALUES IN ECHO 10 RECORDS FOR PROCESSING CENTER AND ARCHIVE CENTER)
EXAMPLES:
In UAT, for Collection with Collection Shortname AQUARIOUS_L4_OISSS_IPRC_7DAY_V4, Version 1:
<ArchiveCenter>PO.DAAC</ArchiveCenter> should translate to DataCenter/Role=ARCHIVER and DataCenter/ShortName = NASA/JPL/PODAAC
<ProcessingCenter>IPRC/SOEST University of Hawaii, Manoa</ProcessingCenter> should translate to DataCenter/Role=PROCESSOR and DataCenter/ShortName = UHI/SOEST/IPRC
e. RECOMMENDATION 5:
Map DIF 10 metadata to UMM-C as follows:
UMM-C DataCenter/Role maps to DIF 10 Organization/Organization_Type
UMM-C DataCenter/ShortName maps to DIF 10 Organization/Organization_Name/Short_Name
EXAMPLE:
<Organization><Organization_Type>DISTRIBUTOR</Organization_Type> <Organization_Name> <Short_Name>CA/EC/MSC</Short_Name> </Organization_Name> maps to
DataCenter/Role=DISTRIBUTOR and DataCenter/ShortName = CA/EC/MSC
2. Contact Type
RECOMMENDATION: Create a schema ENUM list which combines the current GCMD list with three additional contact types proposed by MMT developers: Email, Facebook, Twitter
a. The GCMD has the following ENUM list for Contact Type.
<xs:enumeration value="Direct Line"/>
<xs:enumeration value="Primary"/>
<xs:enumeration value="Telephone"/>
<xs:enumeration value="Fax"/>
<xs:enumeration value="Mobile"/>
<xs:enumeration value="Modem"/>
<xs:enumeration value="TDD/TTY Phone"/>
<xs:enumeration value="U.S. toll free"/>
<xs:enumeration value="Other"/>
This recommendation has already been implemented on the MMT using - MMT-538Getting issue details... STATUS
Changes to UMM-C fields and UMM-Common fields:
See the proposed class diagram for the element definitions, their types and cardinality in Lucidchart (you may need to log in to Lucidchart).
The mapping from UMM-C OLD to UMM-C Proposed is in the attached spreadsheet.
- CMR-2298Getting issue details... STATUS
- ECSE-75Getting issue details... STATUS
- CMR-2265Getting issue details... STATUS
- CMR-1256Getting issue details... STATUS
- UMMC-336Getting issue details... STATUS
Mappings to DIF, ECHO, ISO
The DIF-UMM-ECHO_Mapping.xlsx file can be found here: https://git.earthdata.nasa.gov/projects/EMFD/repos/unified-metadata-model
- ECSE-45Getting issue details... STATUS
Interoperability Considerations
During the original UMM-C review, it was decided that Organization and Personnel elements would be combined into a merged element called Responsible Party, similar to the ISO 19115-2 field CI_ResponsibleParty. Later during the UMM-Common review it was decided to separate role with party to allow for components or xlinks and for reusability of the party element.
The MMT collection forms are organized based on the current UMM-C/UMM-Common metadata model. During MMT alpha and beta testing, some metadata providers indicated that they were confused by the Organization, Personnel and Party elements on the MMT collection record forms. Based on that feedback, we revisited the metadata model for these field groupings, giving consideration to how the metadata model can best reflect the physical model of organizations responsible for earth science datasets and people or groups who can be contacted with questions about those datasets. Based on this further analysis, we are now recommending a model which moves away from the Organization, Responsibility, and Party concepts, and moves back to the Data Center concept that better reflects the physical model of NASA DAACs and makes translation between GCMD collection record values and UMM-C values more straightforward and intuitive.
In addition, we are recommending changing the current UMM-C Personnel element to DataContact, where 'contact' is the operative word. The re-naming of this element to emphasize the function of a contact person, and the corresponding new role enum assigned to DataContact, better addresses a dataset user's possible need to know how to contact someone to get answers to different types of questions about the dataset (e.g., TECHNICAL CONTACT, SCIENCE CONTACT, USER SERVICES, etc.)
The current UMM-C document recommends that Roles should be validated against the ISO 19115-1 code list as shown below.
However, these ISO roles co-mingle data center roles and people (data contact) roles, and a number of the ISO roles (e.g., funder, rightsHolder) are not relevant in the context of the Data Centers and Data Contacts in the CMR earth science data sets. We recommend that instead of using the ISO roles, we use one role enum list for data centers and a separate role enum list for data contacts, as shown in the model above on this page. Our new recommended enum list for Data Center roles includes a role of ARCHIVER, which is not in the ISO list, but reflects the role of the 'Distributed Active Archive Centers' used by NASA to manage earth science data.
Mappings from DataCenter and DataContact elements to ISO elements will be provided in the UMM-C/UMM-Common documents.
Changes to CMR
The new elements will need to be added and possibly indexed if the old elements were indexed. Several names have changed and the translation code will have to be implemented.
- CMR-2706Getting issue details... STATUS
- CMR-1473Getting issue details... STATUS
- CMR-1841Getting issue details... STATUS
- MMT-538Getting issue details... STATUS
- CMR-3157Getting issue details... STATUS
- CMR-3158Getting issue details... STATUS
- CMR-3161Getting issue details... STATUS
- CMR-3208Getting issue details... STATUS
- CMR-3233Getting issue details... STATUS
What should the MMT forms look like
- MMT-489Getting issue details... STATUS
- MMT-538Getting issue details... STATUS
- ECSE-80Getting issue details... STATUS
- MMT-689Getting issue details... STATUS
- MMT-690Getting issue details... STATUS
How should the values in these fields be presented to the user on the EDSC
- EDSC-999Getting issue details... STATUS
- EDSC-942Getting issue details... STATUS
- EDSC-1141Getting issue details... STATUS Organization facet : change 'Organization' to 'Data Center' as the facet name; this reverses a decision made in - EDSC-907Getting issue details... STATUS
On Collection details page, EDSC currently shows:
- EDSC-1140Getting issue details... STATUS
a. instead of 'Processing Center' and 'Archive Center', show <DataCenter/Role>:<DataCenter/ShortName> or <DataCenter/ShortName>(<DataCenter/Role>)
e.g. ARCHIVER: GHRC or GHRC (ARCHIVER)
b. Show all Data Centers listed in the collection metadata record
c. Also display ServiceHours, Contact_Instructions, and RelatedURL for each Data Center, if these fields are present
- EDSC-1142Getting issue details... STATUS
Show all Data Contacts, as <DataContact/Role>: DataContact/ContactPerson/First_Name> <DataContact/ContactPerson/Middle_Name> <Data Contact/ContactPerson/Last_Name> |
a. <DataContact/Role>: <DataContact/ContactGroup/GroupName>
and show all contact mechanisms for the data contact
<DataContact/ContactInformation/ContactMechanism/Type>: <DataContact/ContactInformation/ContactMechanism/Value>
e. g. SCIENCE CONTACT: John Doe, Telephone: (555)1212, Email: johndoe@gmail.com
or USER SERVICES: DAAC User Services, Email: daacuserservices@daacname.gov
b. Also display ServiceHours, Contact_Instructions, and RelatedURL for each Data Contact, if these fields are present
How should pick lists / controlled vocabulary be handled?
Data Center Field | Current Source of Pick list values (MMT) | Proposed Source of Pick list values | Is this field used for EDSC Faceted Search? |
---|---|---|---|
DataCenter/Role | Schema ENUM - current ResponsibilityRoleEnum | Schema ENUM - proposed DataCenterRoleEnum ["ARCHIVER", "DISTRIBUTOR", "ORIGINATOR", "PROCESSOR"] | no |
DataCenter/ShortName | KMS (Data Center) | KMS (Data Center) | yes |
DataCenter/LongName | KMS, or auto fill from KMS after selection short name | KMS, or auto fill from KMS after selection short name | no |
DataCenter/ContactInformation/ContactMechanism/Type | GCMD list plus current MMT values | Schema ENUM - proposed (GCMD list plus current MMT values) | no |
DataCenter/ContactInformation/Address/Country | ISO | ISO | no |
DataCenter/ContactInformation/Address/StateProvince | ISO | ISO | no |
DataCenter/ContactInformation/RelatedURLs
| auto fill from KMS after selecting short name or long name | auto fill from KMS after selecting short name or long name | no |
Data Contact Field | Current Source of Pick list values (MMT) | Proposed Source of Pick list values | Is this field used for EDSC Faceted Search? |
---|---|---|---|
DataContact/Role DataCenter/DataContact/Role | Schema ENUM - current ResponsibilityRoleEnum | Schema ENUM - proposed DataContactRoleEnum
| no |
DataContact/ContactInformation/ContactMechanism/Type DataCenter/DataContact/ContactInformation/ContactMechanism/Type | GCMD list plus current MMT values | Schema ENUM - proposed (GCMD list plus current MMT values) | no |
DataContact/ContactInformation/Address/Country DataCenter/DataContact/ContactInformation/Address/Country
| ISO | ISO | no |
DataContact/ContactInformation/Address/StateProvince DataCenter/DataContact/ContactInformation/Address/StateProvince | ISO | ISO | no |
Reconciliation of Existing Metadata values with new rules
- UMMC-427Getting issue details... STATUS
Map values per Recommendation 3
Standardize DataCenter shortname, longname, url per the KMS values (MMT can assist the user to do this one record at a time).
Correct organization roles that are really personnel roles
Impact of Changes on Ingest of Granule Metadata
None - Organization and Personnel information can be changed in a Collection record without impacting the Collection's granule records.