Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
stylecircle

Element Description

The Data Language element describes the language used in the preparation, storage, and description of the collection. It is the language of the collection data itself. It does not refer to the language used in the metadata record (although this may be the same language). Please refer to the Metadata Language wiki page on how to specify the language used in the metadata. 

Best Practices

While not required, it is recommended that an ISO 639-2 code be used to populate the Data Language field (http://www.loc.gov/standards/iso639-2/php/code_list.php). If a language is not supplied - English, the default language, will be assumed.

Examples:

"eng"  ← Example using the ISO 639-2 code for English

"French" ← Example where the ISO 639-2 code is not used

"chi (B)" ← Example using the ISO 639-2 code for Chinese (See this FAQ for an explanation why "(B)" is included in the code)


Element Specification

Providing a Data Language is optional. Multiple Data Languages may be provided, if necessary (Cardinality: 0..*)

ModelElementTypeConstraintsRequired?Cardinality
UMM-CDataLanguageString

1 - 25 characters

No0..*


Metadata Validation and QA/QC

All metadata entering the CMR goes through the below process to ensure metadata quality requirements are met. All records undergo CMR validation before entering the system. The process of QA/QC is slightly different for NASA and non-NASA data providers. Non-NASA providers include interagency and international data providers and are referred to as the International Directory Network (IDN).

Lucidchart
rich-viewerfalse
autofittrue
nameCopy of Wiki Page Metadata Evaluation Workflow-1939-51df84
width1102
pages-to-display
id98e5dc28-3252-4209-953f-66f1378e1cf4
alignLeft
height299

Please see the expandable sections below for flowchart details.


Expand
titleGCMD Metadata QA/QC
  • Manual Review
    • Identify errors, discrepancies or omissions.
  • Automated Review
    • Check that the field value matches the enumeration value. (English;Afrikaans;Arabic;Bosnian;Bulgarian;Chinese;Croatian;Czech;Danish;Dutch;Estonian;Finnish;French;German;Hebrew;Hungarian;Indonesian;Italian;Japanese;Korean;Latvian;Lithuanian;Norwegian;Polish;Portuguese;Romanian;Russian;Slovak;Spanish;Ukrainian;Vietnamese)
Expand
titleCMR Validation
  • This element is currently not validated.
Expand
titleARC Metadata QA/QC

ARC Priority Matrix

Priority CategorizationJustification

Red = High Priority Finding

This element is categorized as highest priority when:

  • The data languageData Language(s) provided are incorrect for the dataset.

Yellow = Medium Priority Finding

Not applicable

Blue = Low Priority Finding

This element is categorized as low priority when:

  • No data language Data Language is provided when the data language Data Language is a language other than English (the assumed default language).

Green = No Findings/Issues

The element is provided and follows all applicable criteria specified in the best practices section above.

ARC Automated Rules

TBD

Checks

ARC uses the pyQuARC library for automated metadata checks. Please see the pyQuARC GitHub for more information.  

Dialect Mappings

Expand
titleDIF 9

DIF 9 (Note: DIF-9 is being phased out and will no longer be supported after 2018)

Expand
titleDIF 10

DIF 10

Providing a Dataset_Language is optional. Multiple Dataset_Languages may be provided, if necessary (Cardinality: 0..*)

UMM-C ElementPathTypeUsable Valid ValuesRequired in DIF 10?CardinalityNotes
DataLanguage

/DIF/Dataset_Language

Enumeration

English

Afrikaans

Arabic

Bosnian

Bulgarian

Chinese

Croatian

Czech

Danish

Dutch

Estonian

Finnish

French

German

Hebrew

Hungarian

Indonesian

Italian

Japanese

Korean

Latvian

Lithuanian

Norwegian

Polish

Portuguese

Romanian

Russian

Slovak

Spanish

Ukrainian

Vietnamese

No

0..*

Location of the DatasetLanguageEnum in the DIF 10.3 schema:

https://git.earthdata.nasa.gov/projects/EMFD/repos/dif-schemas/browse/10.x/UmmCommon_1.3.xsd#213


Example Mapping

Section
Column
width50%

DIF 10

No Format
<Dataset_Language>English</Dataset_Language>
Column
width50%

UMM

No Format
"DataLanguage": "English"
Expand
titleECHO 10

ECHO 10

Data Language does not map to ECHO 10.

Expand
titleISO 19115-2 MENDS

ISO 19115-2 MENDS

Providing a Data Language is optional. Multiple Data Languages may be provided (Cardinality: 0..*).

UMM-C ElementPathTypeNotes
DataLanguage

/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/ gmd:language/gco:CharacterString

with

/gmi:MI_Metadata/ gmd:identificationInfo/gmd:MD_DataIdentification/ gmd:MD_CharacterSetCode codeList="https://cdn.earthdata.nasa.gov/iso/resources/Codelist/gmxCodelists.xml#MD_CharacterSetCode"  codeListValue=UTF8 

String

Codelist


Maps to the UMM element Data Language.

Provide the data language (e.g. English) in the Character String (first path listed).

The codelist value of "UTF8" must be provided with the string in order for CMR to properly parse out the data language.


Example Mapping

Section
Column
width50%

ISO 19115-2 MENDS

No Format
<gmi:MI_Metadata>
  ...
  <gmd:identificationInfo>
    <gmd:MD_DataIdentification>
      ...
      <gmd:language>
        <gco:CharacterString>English</gco:CharacterString>
      </gmd:language>
      <gmd:MD_CharacterSetCode codeList="https://cdn.earthdata.nasa.gov/iso/resources/Codelist/gmxCodelists.xml#MD_CharacterSetCode"  codeListValue="UTF8">UTF8</gmd:MD_CharacterSetCode>
      ...
    </gmd:MD_DataIdentification>
  </gmd:identificationInfo>
  ...
</gmi:MI_Metadata>
Column
width50%

UMM

No Format
"DataLanguage": "English"  
Expand
titleISO 19115-2 SMAP

ISO 19115-2 SMAP

Providing a Data Language is optional. Multiple Data Languages may be provided (Cardinality: 0..*).

UMM-C ElementPathTypeNotes
DataLanguage

/gmd:DS_Series/gmd:seriesMetadata/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/ gmd:language/gco:CharacterString


with


/gmd:DS_Series/gmd:seriesMetadata/gmi:MI_Metadata/ gmd:identificationInfo/gmd:MD_DataIdentification/ gmd:MD_CharacterSetCode codeList="https://cdn.earthdata.nasa.gov/iso/resources/Codelist/gmxCodelists.xml#MD_CharacterSetCode"  codeListValue=UTF8 


String

Codelist


Maps to the UMM element Data Language.

Provide the data language (e.g. English) in the Character String (first path listed).

The codelist value of "UTF8" must be provided with the string in order for CMR to properly parse out the data language.


Example Mapping

Section
Column
width50%

ISO 19115-2 SMAP

No Format
<gmd:DS_Series>
<gmd:seriesMetadata>
<gmi:MI_Metadata>
  ...
  <gmd:identificationInfo>
    <gmd:MD_DataIdentification>
      ...
      <gmd:language>
        <gco:CharacterString>English</gco:CharacterString>
      </gmd:language>
      <gmd:MD_CharacterSetCode codeList="https://cdn.earthdata.nasa.gov/iso/resources/Codelist/gmxCodelists.xml#MD_CharacterSetCode"  codeListValue="UTF8">UTF8</gmd:MD_CharacterSetCode>
    </gmd:MD_DataIdentification>
    ...
  </gmd:identificationInfo>
  ...
</gmi:MI_Metadata>  
</gmd:seriesMetadata>
</gmd:DS_Series>
Column
width50%

UMM

No Format
"DataLanguage": "English"

UMM Migration

None

Excerpt
hiddentrue

Future Mappings

Expand
titleISO 19115-1

ISO 19115-1

Providing a Data Language is optional. Multiple Data Languages may be provided (Cardinality: 0..*).

UMM-C ElementPathTypeNotes
DataLanguage

/mdb:MD_Metadata/mdb:identificationInfo/mri:MD_DataIdentification/mri:defaultLocale/lan:PT_Locale/lan:language/lan:LanguageCode codeList="https://cdn.earthdata.nasa.gov/iso/resources/Codelist/gmxCodelists.xml#LanguageCode" codeListValue=

with

/mdb:MD_Metadata/mdb:identificationInfo/mri:MD_DataIdentification/mri:defaultLocale/lan:PT_Locale/lan:characterEncoding/lan:MD_CharacterSetCode codeList="https://cdn.earthdata.nasa.gov/iso/resources/Codelist/gmxCodelists.xml#MD_CharacterSetCode" codeListValue=

Codelist


Maps to the UMM element Data Language.

Select the data language from the Language Codelist.

Select a CharacterSetCode. A character set code must be provided in addition to the language in order for CMR to properly parse out the data language.


Example Mapping

Section
Column
width50%

ISO 19115-1

No Format
<mdb:MD_Metadata>
  ...
  <mdb:identificationInfo>
    <mri:MD_DataIdentification>
      ...
      <mri:defaultLocale>
        <lan:PT_Locale>
          <lan:language>
            <lan:LanguageCode codeList="https://cdn.earthdata.nasa.gov/iso/resources/Codelist/gmxCodelists.xml#LanguageCode" codeListValue="English">English</lan:LanguageCode>
          </lan:language>
          <lan:characterEncoding>
            <lan:MD_CharacterSetCode codeList="https://cdn.earthdata.nasa.gov/iso/resources/Codelist/gmxCodelists.xml#MD_CharacterSetCode" codeListValue="UTF8">UTF8</lan:MD_CharacterSetCode>
          </lan:characterEncoding>
        </lan:PT_Locale>
      </mri:defaultLocale>
      ...
    </mri:MD_DataIdentification>
  </mdb:identificationInfo>
  ...
</mdb:MD_Metadata>
Column
width50%

UMM

No Format
"DataLanguage": "English"

History

UMM Versioning

VersionDateWhat Changed
1.15.512/3/2020No changes were made for Data Language during the transition from version 1.15.4 to 1.15.5
1.15.49/18/2020No changes were made for Data Language during the transition from version 1.15.3 to 1.15.4
1.15.37/1/2020No changes were made for Data Language during the transition from version 1.15.2 to 1.15.3
1.15.25/20/2020No changes were made for Data Language during the transition from version 1.15.1 to 1.15.2
1.15.13/25/2020No changes were made for Data Language during the transition from version 1.15.0 to 1.15.1
1.15.02/26/2020No changes were made for Data Language during the transition from version 1.14.0 to 1.15.0
1.14.010/21/2019No changes were made for Data Language during the transition from version 1.13.0 to 1.14.0
1.13.004/11/2019No changes were made for Data Language during the transition from version 1.12.0 to 1.13.0.
1.12.001/22/2019No changes were made for Data Language during the transition from version 1.11.0 to 1.12.0.
1.11.011/28/2018No changes were made for Data Language during the transition from version 1.10.0 to 1.11.0.
1.10.005/02/2018

No changes were made for Data Language during the transition from version 1.9.0 to 1.10.0.

ARC Documentation

VersionDateWhat ChangedAuthor
1.010/12/18Recommendations/priority matrix transferred from internal ARC documentation to wiki space