Hi, I've been trying to work with uploading some metadata to CMR and have been getting unexpected results.  

I initially retrieved (from Reverb) some ISO metadata that existed in the NSIDCV0 provider for dataset NSIDC-0547.  I was able to successfully upload it and search for that dataset into the NSIDC_TS1 provider.

Then we tried altering the metadata and updating it.  It appeared to load and gave a "200 OK" result and returned a concept id.  However, the metadata didn't actually seem to get updated.

We then deleted the dataset, and tried uploading the new file.  Again, we got successful responses, but could not find the dataset on a search.

 

Since then we have tried various combinations of uploading and deleting, both the original file and the updated one.  It's unclear now what CMR has for the dataset.

Search on short name returns (for the NSIDC_TS1 provider):

n5oml01{cfowler}[101]->curl -i -g -H "Echo-Token: XXXX" "https://cmr.uat.earthdata.nasa.gov/search/collections.umm-json?pretty=true&short_name=NSIDC-0547"
HTTP/1.1 200 OK
Date: Mon, 06 Jun 2016 21:01:22 GMT
Content-Type: application/vnd.nasa.cmr.umm+json; charset=utf-8
Access-Control-Expose-Headers: CMR-Hits, CMR-Request-Id
Access-Control-Allow-Origin: *
CMR-Hits: 3
CMR-Took: 11
CMR-Request-Id: 5b273735-82da-4d57-a747-2f57f146900e
Content-Length: 1705
Server: Jetty(9.2.z-SNAPSHOT)

{
"hits" : 3,

.

.

.

"meta" : {
"native-id" : "collections/NSIDC-0547.001",
"provider-id" : "NSIDC_TS1",
"concept-type" : "collection",
"concept-id" : "C1216143440-NSIDC_TS1",
"revision-date" : "2016-06-02T23:08:21Z",
"user-id" : "cathyf",
"deleted" : false,
"revision-id" : 4,
"format" : "application/iso19115+xml"
},
"umm" : {
"entry-title" : "MEaSUREs MODIS Mosaic of Greenland 2005 (MOG2005) Image Map V001",
"entry-id" : "NSIDC-0547_1",
"short-name" : "NSIDC-0547",
"version-id" : "1"

.

.

.

Delete returns:

n5oml01{cfowler}[169]->curl -i -XDELETE -H "Echo-Token: XXXX" https://cmr.uat.earthdata.nasa.gov/ingest/providers/NSIDC_TS1/collections/NSIDC-0547.001
HTTP/1.1 404 Not Found
Date: Mon, 06 Jun 2016 21:02:39 GMT
Content-Type: application/xml; charset=ISO-8859-1
Access-Control-Allow-Origin: *
CMR-Request-Id: cc0143fc-d73b-42b0-8c18-ddd8b6ba40a1
Content-Length: 168
Server: Jetty(9.2.z-SNAPSHOT)

<?xml version="1.0" encoding="UTF-8"?><errors><error>Concept with native-id [NSIDC-0547.001] and concept-id [C1216110036-NSIDC_TS1] is already deleted.</error></errors>

 

-->> I see that the concept id here is different, but I'm only providing the native id on the command line so am not sure what this means.  I want to delete concept id = C1216143440-NSIDC_TS1.

 

Upload returns:

curl -i -XPUT -H "Content-type: application/iso19115+xml" -H "Echo-Token: XXXX" https://cmr.uat.earthdata.nasa.gov/ingest/providers/NSIDC_TS1/collections/collections/NSIDC-0547.001 -d @NSIDC-0547_iso.xml_from_julia
HTTP/1.1 100 Continue

HTTP/1.1 200 OK
Date: Mon, 06 Jun 2016 21:03:13 GMT
Content-Type: application/xml; charset=ISO-8859-1
CMR-Request-Id: 72d39dae-531b-4e6c-b47b-b08dfccf55f0
Content-Length: 129
Server: Jetty(9.2.z-SNAPSHOT)

<?xml version="1.0" encoding="UTF-8"?><result><concept-id>C1216143440-NSIDC_TS1</concept-id><revision-id>8</revision-id></result>

-->> However, if I now go back and do a search, the date has not changed from the original upload time:     "revision-date" : "2016-06-02T23:08:21Z"

 

 

Going to Earthdata Search and clicking on the "info" button for the dataset produces an error:

Error retrieving C1216143440-NSIDC TS1

 

 

 

  • No labels

9 Comments

  1. Cathy, 

    This is the same problem as documented in CMR-3022. ISO schema puts science keyword in a string delimited by %gt; without any validation rules. In CMR, we require that the category, topic and term keywords must exist for any science keyword. For this particular collection (C1216143440-NSIDC_TS1), the science keyword "MEASURES &gt;Making Earth System Data Records for Use in Research Environments" does not have a term keyword. This caused the indexing of the collection to fail. The fix to CMR-3022 will fail this collection during validation, so you will get a 422 error when ingesting the collection. 

    At the mean time, you can fix the metadata by adding a term keyword to the science keyword.

    If you think no term keyword should be provided from the science data point of view, then we need to get more people involved to see if we should make term keyword optional for science keyword instead.

    Let me know what you think. Thanks!

    Leo

  2. I don't have permissions to view the record in question on UAT.  However, the CMR should only require "Category > Topic > Term" for Science Keywords.  MEASURES is a Project keyword so Term does not apply. 

    Project: Short_Name > Long_Name.  (Short_Name should be required;Long_Name optional)

    Science Keyword: Category > Topic > Term > Variable_Level_1 >  Variable_Level_2 > Variable_Level_3 ( Category > Topic > Term required; Variable_Level_1 >  Variable_Level_2 > Variable_Level_3 optional)

  3. Thanks, Leo and Scott.  I'm not sure between your two comments if there is any action needed on my side or if the "term" issue needs to be worked out there first.  Please let me know.  Thanks!

  4. I am including the collection metadata in question here. Looks like the way we parse ISO19115 science keywords is not what NSIDC expected. Here is our rule of identifying science keywords in ISO:

    We take all the gmd:descriptiveKeywords/gmd:MD_Keywords that has a gmd:MD_KeywordTypeCode of "theme" as science keywords. This is not how NSIDC uses gmd:MD_Keywords. From the included metadata, it looks like NSIDC wants to identify the different keywords by their title element. e.g. we have "NASA / GCMD Science Keywords" and "NASA / GCMD Project Keywords" as titles for various keywords. This is outside of my expertise to determine what is the right way to represent (thus parse) the different keywords in ISO. I am including Erich Reiter in this discussion to clarify the correct representation of science keywords and other keywords in ISO19115 format.

    Once that is clarified, I will make the corresponding changes in CMR.

    C1216143440-NSIDC_TS1.xml

  5. To the best of my ISO knowledge <CodeDefinition gml:id="MD_KeywordTypeCode_theme"> is the right code for Project and Science Keywords in general.  I don't see a code specifically for "Projects".   However, I don't think you can assume anything with md:MD_KeywordTypeCode of "theme" is a science keyword.  I recommend that the CMR key off of something else perhaps <gmd:thesaurusName>.  Ideally it would be nice if you can do this with a code list, but I don't see anything in the ISO code list for "Projects".

  6. No - Projects have a keyword type code of project as shown below:  

    <gmd:type>
              #MD_KeywordTypeCode"codeListValue="project">project</gmd:MD_KeywordTypeCode>
    </gmd:type>

    Science Keywords have a keyword type code of theme as shown here:
    <gmd:type>
                      #MD_KeywordTypeCode"codeListValue="theme">theme</gmd:MD_KeywordTypeCode>
    </gmd:type>

    Here is the list:

    science keywords ->  theme
    data center -> dataCenter
    discipline -> discipline -  keyword identifies a branch of instruction or specialized learning
    instrument -> instrument
    location -> place
    platform -> platform
    project -> project
    stratum -> stratum - keyword identifies the layer(s) of any deposited substance
    temporal -> temporal

    So for this section the type needs to be theme:
    <gmd:descriptiveKeywords>
     <gmd:MD_Keywords>
     <gmd:keyword>
     <gco:CharacterString>
    EARTH SCIENCE >Cryosphere >Glaciers/Ice Sheets >Firn >Snow Grain Size
    </gco:CharacterString>
    </gmd:keyword>
     <gmd:keyword>
     <gco:CharacterString>
    EARTH SCIENCE >Cryosphere >Glaciers/Ice Sheets >Firn >Snow Grain Size
    </gco:CharacterString>
    </gmd:keyword>
     <gmd:keyword>
     <gco:CharacterString>
    EARTH SCIENCE >Cryosphere >Glaciers/Ice Sheets >Glacier Topography/Ice Sheet Topography >Surface Morphology
    </gco:CharacterString>
    </gmd:keyword>
     <gmd:keyword> 
     <gco:CharacterString>
    EARTH SCIENCE >Cryosphere >Glaciers/Ice Sheets >Glacier Topography/Ice Sheet Topography >Surface Morphology
    </gco:CharacterString>
    </gmd:keyword>
     <gmd:type> 
                   #MD_KeywordTypeCode"codeListValue="discipline">discipline</gmd:MD_KeywordTypeCode> 
    </gmd:type>
     <gmd:thesaurusName>
     <gmd:CI_Citation>
     <gmd:title>
    <gco:CharacterString>NASA / GCMD Science Keywords</gco:CharacterString>
    </gmd:title>
     <gmd:date>
     <gmd:CI_Date> 
     <gmd:date> 
    <gco:Date>2008-02-05</gco:Date>
    </gmd:date>
     <gmd:dateType>
                  #CI_DateTypeCode"codeListValue="revision">revision</gmd:CI_DateTypeCode>
    </gmd:dateType>
    </gmd:CI_Date>
    </gmd:date>
    </gmd:CI_Citation>
    </gmd:thesaurusName>
    </gmd:MD_Keywords>
    </gmd:descriptiveKeywords>


    For this section the type needs to be project:
     
    <gmd:descriptiveKeywords>
     <gmd:MD_Keywords>
     <gmd:keyword> 
     <gco:CharacterString>
    MEASURES >Making Earth System Data Records for Use in Research Environments
    </gco:CharacterString>
    </gmd:keyword>
     <gmd:type>
                #MD_KeywordTypeCode"codeListValue="theme">theme</gmd:MD_KeywordTypeCode>
    </gmd:type>
     <gmd:thesaurusName> 
     <gmd:CI_Citation>
     <gmd:title>
    <gco:CharacterString>NASA / GCMD Project Keywords</gco:CharacterString>
    </gmd:title>
     <gmd:date>
     <gmd:CI_Date>
     <gmd:date>
    <gco:Date>2008-01-24</gco:Date>
    </gmd:date>
     <gmd:dateType>
                #CI_DateTypeCode"codeListValue="revision">revision</gmd:CI_DateTypeCode>
    </gmd:dateType>
    </gmd:CI_Date>
    </gmd:date>
    </gmd:CI_Citation>
    </gmd:thesaurusName>
    </gmd:MD_Keywords>
     
    </gmd:descriptiveKeywords>

    For this section the type should also be project
    <gmd:descriptiveKeywords>
     <gmd:MD_Keywords>
     <gmd:keyword>
    <gco:CharacterString>MEaSUREs-project</gco:CharacterString>
    </gmd:keyword>
     <gmd:keyword>
    <gco:CharacterString>MODIS-related project</gco:CharacterString>
    </gmd:keyword>
     <gmd:type>
              #MD_KeywordTypeCode"codeListValue="theme">theme</gmd:MD_KeywordTypeCode>
    </gmd:type>
    </gmd:MD_Keywords>
    </gmd:descriptiveKeywords>
  7. One clarification to Leo's original response:  I did not get a 422 error when attempting to upload any of the files I tried; I got a 200 response indicating success.  I'm still unclear on why that happened.

    When I validate any of the files, however, I get a 400 error ("bad request").  It would be helpful to get the more specific 422 response.

    1. Cathy, CMR-3022 is being worked. It is not merged yet and not installed in UAT. I am just telling you the future behavior of CMR after CMR-3022 is fixed. (smile) I don't think you should get a 400 error when validating the collection in UAT. It must be caused by some un-intended error in your validation request.