Child pages
  • Breaking up long Keywords
Skip to end of metadata
Go to start of metadata

In a recent ingest on the UAT instance of CMR, we got the following error message (line breaks added for legibility):

<?xml version="1.0" encoding="UTF-8"?><result><concept-id>C1216394131-NSIDC_TS1</concept-id>
<revision-id>14</revision-id>
<warnings>After translating item to UMM-C the metadata had the following issue: /LocationKeywords/2/Type string
"Continent > North America > United States Of America > California > Tuolumne River Basin" is too long (length: 89, maximum allowed: 80)
</warnings></result>

Many of the keywords are probably going to be long like this.  If the 80-character limit is indeed there, there would need to be a way to break these keywords down.  Is there a way to breakdown these multi-level keywords down so that they will fit in the UMM-C field, but still have their order and heirarchy preserved?

  • No labels

5 Comments

  1. Note that this problem is only occurring in UAT.  In OPS the longer keywords ingest into CMR from our ISO correctly.

  2. I asked Erich Reiter to take a look at this. He's investigating it and says it might be a bug.

  3. There are 2 problems that exist in this scenario:

    1) The last keyword needs to be changed

    2) The CMR translation code needs to be fixed.

     

    For the first problem:

    The schema for the location keywords is the following:

        "LocationKeywordType": {
          "description": "This element defines a mapping to the GCMD KMS hierarchical location list. It replaces SpatialKeywords. Each tier must have data in the tier above it.",
          "type": "object",
          "additionalProperties": false,
          "properties": {
            "Category":{
              "description": "Top-level controlled keyword hierarchical level that contains the largest general location where the collection data was taken from.",
              "$ref": "umm-cmn-json-schema.json#/definitions/KeywordStringType"
              },
            "Type":{
              "description": "Second-tier controlled keyword hierarchical level that contains the regional location where the collection data was taken from",
              "$ref": "umm-cmn-json-schema.json#/definitions/KeywordStringType"
              },
            "Subregion1":{
              "description": "Third-tier controlled keyword heirarchical level that contains the regional sub-location where the collection data was taken from",
              "$ref": "umm-cmn-json-schema.json#/definitions/KeywordStringType"
            },
            "Subregion2":{
              "description": "Fourth-tier controlled keyword heirarchical level that contains the regional sub-location where the collection data was taken from",
              "$ref": "umm-cmn-json-schema.json#/definitions/KeywordStringType"
            },
            "Subregion3":{
              "description": "Fifth-tier controlled keyword heirarchical level that contains the regional sub-location where the collection data was taken from",
              "$ref": "umm-cmn-json-schema.json#/definitions/KeywordStringType"
            },
            "DetailedLocation":{
              "description": "Uncontrolled keyword heirarchical level that contains the specific location where the collection data was taken from. Exists outside the heirarchy.",
              "$ref": "umm-cmn-json-schema.json#/definitions/KeywordStringType"
            }
          },
          "required": ["Category"]
        }

    The last keyword broken down into the UMM Location Keyword elements is:

    Category: Continent

    Type: North America

    Subregion1: United States of America

    Subregion2: California

    Subregion3: Tuolumne River Basin

    DetailedLocation:

    The above is not a valid location keyword as defined by GCMD. The Tuolumne River Basin should be a DetailedLocation instead.  To fix this problem add a  NONE &gt; in between &gt; and Tuolumne.  The corrected third keyword is shown below:

     

    <gmd:keyword>
        <gco:CharacterString>Continent &gt; North America &gt; United States Of America &gt; California  &gt; NONE &gt; Tuolumne River Basin</gco:CharacterString>
    </gmd:keyword>

     

     The second problem is a translation issue from ISO to UMM.  Issue CMR-3903 has been written to fix this problem.  The corrected translation into UMM should be the following:

    "LocationKeywords" : [ {
        "Category" : "Continent",
        "Type" : "North America",
        "Subregion1": "United States Of America"
      }, {
        "Category" : "Continent",
        "Type" : "North America",
        "Subregion1": "United States Of America",
        "Subregion2": "Colorado"
      }, {
        "Category" : "Continent",
        "Type" : "North America",
        "Subregion1": "United States Of America",
        "Subregion2": "California",
        "DetailedLocation": "Tuolumne River Basin"
      } ],

     

     

  4. We will modify our code to add the "NONE &gt;" bits for missing subcategories.  I did have one question about this, though: these extra bits are only needed if there is actually a DetailedLocation, correct?  For instance, if a particular Location Keyword is just the first one in your list above, it would only need to say:

    "Continent &gt; North America &gt; United States of America" 

    Is this correct?  IE, the NONE's are only needed to pad between missing parts of the keyword and the DetailedLocation?

    1. Yes that is correct.