You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 18 Next »

Table of Contents

Introduction

The focus of this Case Study is to identify and compare the usage of metadata elements and attributes in CMR metadata collections as well to identify and compare the completeness of UMM-Profile concepts in CMR metadata collections. The metadata usage studies include a comparison of NASA metadata with IDN and SciOps metadata as well as an evaluation of Commonly Used Documentation Objects (CUDOs).This work updates our analysis of CMR metadata in several important ways:

1)     We retrieved new metadata records for all collections in the CMR during March 2017. This increased the size of our sample from ~4000 records from the NASA DAACs to over 32,000 records from the DAACs, SciOps, and the International Directory Network (IDN).

2)     We added a new metric to our calculations that reports the % of records in a metadata group (e.g. DAAC) that include a concept or item. This provides important information for the collection managers as well as providing information on the usage of various metadata elements. For example, we can distinguish items that occur once in every record from those that occur multiple times in some records.

3)     We developed new visualizations for comparing metadata collections and used these visualizations to compare:

  1. DAAC records in 2016 to DAAC records in 2017
  2. DAAC records to SciOps records
  3. DAAC records to IDN records

UMM-Collection Completeness

We examined completeness of the NASA and IDN metadata groups with respect to the UMM-Collection recommendation. Nine of the fifteen required elements are complete in all these metadata collections, see Table 1.

Table 1 - UMM Concept Percent Completeness in NASA Collections

Required Concept
% Complete
Required Concept
% Complete
Required Concept
% Complete
Required Concept
% Complete
Metadata Dates100%Abstract100%Keyword100%Platform Short Name97%
Resource Identifier100%Data Dates100%Related URL94%Instrument Short Name93%
Resource Title100%Processing Level99%Temporal Extent100%Project Name73%
Resource Version100%Responsibility100%Spatial Extent95%  

Summary Tables include concept names (with links to information describing the concepts in the ISO Explorer), ISO paths used to search for the concepts, summary guidance relevant to the specific concepts, histograms that show the number of records in each collection that are missing the concept as well as links to table that shows the specific records that are missing various elements.

Commonly Used Documentation Objects

All scientific documentation includes contact information for people and organizations, identifiers, references to external resources (online and offline), spatial and temporal extents, keywords, and other items that occur multiple times. ISO metadata includes standard representations for these objects (and others) and it is helpful to use these standard representations as templates throughout a metadata collection.

We examined usage of these Commonly Used Documentation Objects (CUDOs) across NASA and IDN Collections and identified a number of differences across collections. We also identify collections with more complete information that can be used as examples for guiding improvement of others.

Notes

Contact Information: Most contact information in the CMR is limited to organization names and roles and contact information as part of the resource citation is rare. The email element of the contact information is important across all contact information but it is absent from many collections and contact sections.

Identifiers: Identifiers are complete across NASA and IDN for metadata records and for resource citations but are not consistently used for other items, e.g. platforms, instruments, missions.

Citations: Resource citations are complete in all collections. The ISO standard includes mechanisms for over thirty types of external documentation sources, e.g. algorithm descriptions, quality reports, scientific papers, etc. These capabilities are generally unused in CMR metadata because they generally do not exist in the primary source dialects (DIF, ECHO).

Online Resources: Most collections contain online resources for data distribution and many of those URL have associated names. Fewer have descriptions that might help users understand the function of the URL.

Spatial Extents: Minimum bounding rectangles are the most commonly used spatial extent and they are complete in 50% of the NASA and IDN collections.

Temporal Extents: Temporal extents are generally more common than spatial extents in NASA and IDN collections.

NASA DAAC Metadata Evolution 

This report updates the metadata evaluation that we did during 2016 and provides an opportunity to identify how the CMR metadata have evolved over the year. The total number of records increased by over 50% during this time. We introduced a new visualization to summarize this comparison. Table 2 summarizes the results and provides links to Tables that show the elements that changed:

 

Table 2. Counts of completeness changes in

NASA DAAC Collections - 2016-2017

 

 

 

2017

 

 

None

Some

All

2016

All

 

4

22

Some

21

 

5

None

 

48

  

 

The largest change identified is forty-eight elements that were introduced to the metadata during 2017. The deletion of twenty-one elements that existed in some collections in 2016 and in none during 2017 was primarily due to an improvement in the translation from the CMR into ISO. 

CMR Metadata Groups 

The CMR includes three groups of metadata records with separate and distinct histories and processing paths, see Table 1. The first, referred to as the NASA Collection, is made up of metadata records originally created at DAACS using the ECHO dialect. The second, referred to as the IDN Collection, includes records from major International data providers that are ingested into the CMR by SciOps. The third collection, referred to as SciOps, includes metadata records from over 1500 sources that originated in the Global Change Master Directory (GCMD) and the DIF dialect. Each of these collections includes sources that are analyzed separately with the expectation that they may have homogeneous characteristics. Of course, the validity of this assumption may vary with collection and source.

 

Table 1. Metadata Groups in the Common Metadata Repository (CMR)

 

Group Title

# Records

Group History

Major components - # Records

NASA

6367

Traditional DAAC Metadata –

ECHO Dialect

GES-DISC – 1044

ORNL – 1216

18 DAAC Collections

IDN

8702

Non-NASA Collections –

Managed by SciOps –

Typically, DIF dialect

NOAA_NCEI – 5488

AU_AADC – 2559

8 Miscellaneous

Collections

SciOps

(formerly GCMD)

5465

Miscellaneous, mostly non-NASA – DIF Dialect

NZ – 857

UCAR – 437

ACADIS – 393

Korea Polar - 329

 

Comparisons between these metadata groups are influenced by the fact that the collections that originate in ECHO contain much more content (406 items) than the collections that originate in DIF (175 items). Much of this content is related to additional attribute information and detailed contact information that exists in ECHO but not DIF. 

NASA vs. IDN Comparison

The IDN group includes metadata collections from many large international data producers and providers. We had anticipated that these collections might provide insight into metadata practices and priorities of these organizations. In fact, these metadata are collected and shepherded into the CMR by SciOps and it appears that they reflect SciOps metadata management practices more than they reflect the metadata practices of the originating organizations. See NASA vs. IDN for the comparison.

NASA vs. SciOps Comparison

The SciOps group includes over 13,000 metadata records that originated in the GCMD and were provided by nearly 2000 data providers, all non-IDN members. These providers are diverse and over 1700 of them have less than ten records. We selected twenty-five providers with more than 100 records for the comparison of NASA vs. SciOps.

 

  • No labels