You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 171 Next »

General Information

The Common Metadata Repository (CMR) catalogs all data and service metadata records for the EOSDIS system and will be the authoritative management system for all EOSDIS metadata.  These metadata records are registered, modified, discovered, and accessed through programmatic interfaces leveraging standard protocols and APIs.

CMR is designed to:

  • serve as a middleware replacement for the ECHO and GCMD’s backend. Note: Users of GCMD should see no impact - as the GCMD frontend will remain as is.
  • handle metadata at the Concept level; including Collections, Granules, Visualizations, Parameters, Documentation, Services, and more.
  • manage hundreds of millions of metadata records; making them available through high performance, standards compliant, temporal, spatial, and faceted search.
  • incorporate both human and machine metadata assessment features that work to ensure the highest quality metadata possible.
  • support multiple metadata standards using an evolvable Unified Metadata Model (UMM).

The UMM is an extensible metadata model which provides a ‘Rosetta stone’ or cross-walk for mapping between CMR-supported metadata standards. Information detailing what the UMM is and how it works is available on Earthdata.gov

The CMR system is composed of:

  • CMR itself (formerly the [ECHO])
  • GCMD
  • International Data Network (IDN)
  • Earth Science Data and Information System (ESDIS) Metrics System (EMS)
  • All related tools (internal and external)
  • All metadata — including UMM concepts, the GCMD Keywords Controlled Vocabulary, and other controlled vocabularies.                                                         

                      

                                                     








                                                   High Level Architecture Diagram
                

The CMR is designed to handle metadata at the Concept level. Collections and Granules are common metadata concepts, but this can be extended out to Visualizations, Variables, Documentation, Services, and more. The CMR provides a flexible ingest system with pluggable adapters that can handle multiple metadata record formats, multiple metadata record concepts, and relationships and validations between them. As new formats are introduced, new ingest adapters can be written for the CMR to provide ingest, validation and search support and response adapters provide format conversions for backward compatibility.                                                       

                     

                                                         Metadata at the Concept Level  

Enhanced Performance

Modern Earth Science applications strive to provide end users with nearly immediate access and interactivity across massive stores of Earth Science data. That data is discovered, navigated, and often interrogated through science metadata. As the range of applications grow and more and more information moves from the underlying science data to metadata, the challenges of navigating even just the metadata increases. CMR is designed to handle hundreds of millions of metadata records; making them available through high-speed performance, standards compliant, temporal, spatial, and faceted search.

The CMR builds on the work done by ECHO and the GCMD to provide a unified, authoritative repository for NASA's Earth Science metadata.

                          

                                                        End-User Interface Diagram


Quality Assurance

High performance access to metadata is only part of the problem. To be useful to a broad range of Earth Science applications, the metadata must be of high quality, complete, and consistent. The CMR incorporates both human and machine metadata assessment features that work to ensure the highest quality metadata possible. During ingest, automated metadata scoring rubrics are applied giving data providers insight into how to make their data more discoverable or usable by end users. Science Coordinators and review teams can review metadata that fails verification or lacks required information to help providers make their metadata more consistent and complete.

 

                         

                                                        Metadata Quality Assurance Diagram

                                               

Consistent Metadata Representation

  1. The CMR's ingest adapter framework supports pluggable adapters which validate distinct metadata formats such as ECHO10, GCMD DIF, and ISO19115 against a common set of core metadata requirements, the Unified Metadata Model (UMM).
  2. The CMR will be able to take multiple metadata records associated with a common core concept, such as a Collection, and merge the disparate information into a robust and standards compliant ISO19115 representation for interested clients.
  3. As additional metadata concepts are introduced to the CMR, new ingest adapters will provide verification and search indexing capabilities across diverse metadata such as visualization and parameter information.

                                                                 


 

All search and ingest that are done via ECHO now will continue to be serviced. The ECHO API will work seamlessly with the CMR and will be backwards compatible. However, there is a period of transition from returning ECHO results to returning CMR results. During this transition period, ECHO will fan-out ingest and search requests to the CMR system.

The diagram below outlines the high-level interaction between these systems:

For more information, please email support@earthdata.nasa.gov and include 'CMR' in the email subject.

Reconciliation making sure your local provider metadata inventory matches CMR's metadata inventory

ECHO provided long and short form reconciliation functionality to providers. This was a vital piece of functionality for ECHO, prior to catalog-rest, for the following reasons,

  1. ECHO ingest was asynchronous. Once you FTP'd your files at some point in the future it was ingested and it may fail. How do I know what failed?
  2. ECHO ingest was a batch process. If a batch failed it was tricky trying to figure out what part of that batch failed.
  3. ECHO normalized the metadata in a database. It deconstructed what the provider put in and reconstructed it when it was searched for. That was an error-prone process

Consequently, providers needed to verify that the ECHO inventory matched what they had locally using separate processes. Short form verification allowed you to match inventory at a coarse level using id matching. Long form allowed you to determine discrepancies in content of metadata.

What is important to a provider is the following,

  1. The same collection and granule metadata is represented in both systems - short form reconciliation
  2. The content of the above is the same - long form reconciliation

How can we do that in CMR?

  1. When you ingest metadata into CMR it is a synchronous process. It is also searchable within CMR within a few milliseconds. After a successful ingest, CMR will return a concept id of that piece of metadata. You can search for that piece of metadata using that concept id and/or the native id. This way you can do long-form reconciliation on a metadata-by-metadata basis.
  2. You can get the number of collections or granules in a provider using the holdings endpoint. This, in conjunction, with step 1, gives you short-form reconciliation.

If step 1 fails for a piece of inventory you can trouble-shoot and repeat the exercise. If step 2 fails them additional triage is required form the CMR Operations group.

CMR Service Desk

Questions, comments, technical issues, and change requests should be sent to: support@earthdata.nasa.gov

 

 

 

 

 

 

  • No labels