Element Description
The Related URLs element, when accompanied by the 'get data' sub-elements, is used to link directly to a data access point. This is different from a GET SERVICE URL, which relates to methods of sub-setting and/or transforming data before obtaining it. For details concerning GET SERVICE URLs, please see the Related URLs (GET SERVICE) wiki page.
For general Related URL guidance (not specific to service links), please refer to the Related URLs wiki page.
Best Practices
The GET DATA Related URL metadata element allows for the linkage of a metadata record to a location on the web where data may be directly accessed. As mentioned on the Related URLs wiki page, there are several sub-elements which are used to identify the purpose of the URL. For GET DATA links specifically, best practices for these elements include:
URL Content Type: The URL Content Type is a keyword which, at a high level, describes the content of a link. This is a controlled vocabulary field maintained as an enumeration list within the UMM-Common schema. For GET DATA URLs, the URL Content Type should always be "DistributionURL".
URL Type: The URL Type is a keyword which specifies the content of a link. URL Type keywords are maintained in the Keyword Management System (KMS). For GET DATA URLs, the URL Type should always be "GET DATA".
URL Subtype: The URL Subtype is a keyword which further specifies the content of a link. Together, the URL Type and Subtype keywords create a keyword hierarchy which is used to identify the URL. Providing a Subtype for GET DATA URLs is optional, but should be used when applicable. Subtype keywords can be found in the URL Content Type keyword list. For access links, any Subtype keyword after GET DATA is a valid option.
Description: While not required, it is highly recommended that a description be provided for each URL provided in the metadata. The description should be kept brief and explain to the user that the link goes to a data access point. The descriptions should be unique to the link. While descriptions can be repeated for the same type of URL across different metadata records, it is generally advised that the same description not be repeated within the same metadata record. I.e. the description should be used to further differentiate two GET DATA URLs with the same URL Type and Subtype.
There are several sub-elements specifically designated for GET DATA URLs. The following provides definitions and best practices for each of the sub-elements:
RelatedUrls/GetData/Format: The format of the data provided via the associated URL. Format is an optional sub-field. Format is a controlled vocabulary field and should be chosen from the <insert GCMD data format keyword list>. If data is provided in a compressed file format (e.g. .zip, .tar), it is recommended that the format of the data once it is uncompressed be provided.
RelatedUrls/GetData/MimeType: The mime type of the associated URL. Mime Type is an optional sub-field. Mime Type is a controlled vocabulary field and should be chosen from the <insert GCMD mime type keyword list>.
RelatedUrls/GetData/Size: The size of the data file(s) to be downloaded via the provided URL. Size should only be provided when clicking on the URL results in an immediate download. Size need not be provided if the user must perform additional tasks or queries after clicking on the URL, such as navigating through a data directory to select a particular file to download (since in this case, the size of the download may be variable). The purpose of this element is to inform users of the size of the data that will be downloaded to their local work space if they click on the data access URL. This element only contains the numerical value of the size, not the unit. The unit (e.g. KB, MB, etc.) should be provided in the 'Unit' element.
RelatedUrls/GetData/Unit: The unit accompanying the size specified in the 'Size' element. Together, Size and Unit indicate the size of the data file(s) to be downloaded via the provided URL. Unit must be provided if 'Size' is provided. Size and Unit should only be provided if clicking on the URL results in an immediate download. Size and Unit need not be provided if the user must perform additional tasks or queries after clicking on the URL, such as navigating through a data directory to select a particular file to download (since in this case, the size of the download may be variable). Unit is a controlled vocabulary field and must be selected from the following options: KB, MB, GB, TB, PB.
RelatedUrls/GetData/Fees: The fee (if any) for ordering the data. The fee should be a number in U.S. dollars. This is an optional field.
RelatedUrls/GetData/Checksum: A value used to verify the integrity of a file or a data transfer, such as the integrity of a downloaded file. Checksum should only be provided if clicking on the URL results in an immediate download. The checksum provided in the metadata may be used to compare the original data file to the copy of the data downloaded to a user's local work space via the data access URL. The checksum provided in the metadata should be that of the original data file. The user is responsible for generating a checksum of the file on their local work space. If the checksum generated on their local work space does not match the checksum in the metadata (i.e. the original file), the data may have been altered or corrupted. Note that there are a variety of checksum algorithms available for use. Some examples include: MD5, CRC-8, CRC-16, CRC-32, Feltcher's checksum, Alder-32.
For NASA EOSDIS data sets:
- At least one GET DATA URL is required for all NASA EOSDIS data sets.
- For NASA EOSDIS data, data access should be behind Earthdata Login authentication.
- For NASA EOSDIS data, data access should not be provided via FTP, in favor of the HTTPS protocol.
Examples:
URL: https://hydro1.gesdisc.eosdis.nasa.gov/data/FLDAS/FLDAS_VIC025_C_EA_M.001/
URL Content Type: DistributionURL
URL Type: GET DATA
URL Subtype: DATA TREE
Description: Use the link to access the data via HTTPS. Files are organized by date.
Format: NetCDF-4
Mime Type: text/html
Fees: 0
URL: https://daac.ornl.gov/cgi-bin/download.pl?ds_id=465&source=dsviewer
URL Content Type: DistributionURL
URL Type: GET DATA
URL Subtype: DIRECT DOWNLOAD
Description: Downloads the NPP Boreal Forest: Canal Flats, Canada, 1984, R1 data set directly to your workstation.
Format: Text File
Size: 91.8
Unit: KB
Fees: 0
Checksum: f2aa78d6825ec42d783581a8d5ea1f 68
Element Specification
An unlimited amount of Related URLs may be listed (Cardinality: 0..*)
Model | Element | Type | Usable Valid Values | Constraints | Required? | Cardinality | Notes |
---|---|---|---|---|---|---|---|
UMM-Common | RelatedUrls/URL | String | n/a | 1 - 1024 characters | Yes | 1 | The GET DATA URL should point the user to a location where data files may be directly downloaded. |
UMM-Common | RelatedUrls/Description | String | n/a | 1 - 4000 characters | No | 0..1 | It is strongly recommended that a description be provided for each URL. |
UMM-Common | RelatedUrls/URLContentType | Enumeration | CollectionURL PublicationURL DataCenterURL DistributionURL DataContactURL VisualizationURL | n/a | Yes | 1 | "DistributionURL" is the only valid option for access links. |
UMM-Common | RelatedUrls/Type | String | KMS controlled | n/a | Yes | 1 | "GET DATA" should be provided as the Type. |
UMM-Common | RelatedUrls/Subtype | String | KMS controlled | n/a | No | 0..1 | The Type and Subtype are part of a keyword hierarchy specified in the KMS. Any Subtype listed get after GET DATA in the keyword list is a valid option. If none of the available Subtypes are appropriate for the URL, then it is okay to leave the Subtype field blank. |
UMM-Common | RelatedUrls/GetData/Format | String | KMS controlled | n/a | Yes | 1 | The format of the data provided via the associated URL. |
UMM-Common | RelatedUrls/GetData/MimeType | String | KMS controlled | n/a | No | 0..1 | The mime type of the associated URL. |
UMM-Common | RelatedUrls/GetData/Size | Number | n/a | n/a | Yes | 1 | The size of the data obtained via the associated URL. Size should only be provided if clicking on the URL results in an immediate download. |
UMM-Common | RelatedUrls/GetData/Unit | Enumeration | KB MB GB TB PB | n/a | Yes | 1 | Unit is required if information is provided in the 'Size' element. Size and Unit should only be provided if clicking on the URL results in an immediate download. |
UMM-Common | RelatedUrls/GetData/Fees | String | n/a | 1 - 80 characters | No | 0..1 | The fee (if any) for ordering the data. The fee should be a number in U.S. dollars. |
UMM-Common | RelatedUrls/GetData/Checksum | String | n/a | 1 - 50 characters | No | 0..1 | Checksum should only be provided if clicking on the URL results in an immediate download. Note that there are a variety of checksum algorithms available for use. |
Metadata Validation and QA/QC
All metadata entering the CMR goes through the below process to ensure metadata quality requirements are met. All records undergo CMR validation before entering the system. The process of QA/QC is slightly different for NASA and non-NASA data providers. Non-NASA providers include interagency and international data providers and are referred to as the International Directory Network (IDN).
Please see the expandable sections below for flowchart details.
Dialect Mappings
UMM Migration
None
Future Mappings
History
UMM Versioning
Version | Date | What Changed |
---|---|---|
1.10.0 | 5/2/2018 | <> |
1.9.0 |
ARC Documentation
Version | Date | What Changed | Author |
---|---|---|---|
1.0 | 7/26/18 | Recommendations/priority matrix transferred from internal ARC documentation to wiki space | Jeanne' le Roux |