The GET DATA Related URL metadata element allows for the linkage of a metadata record to a location on the web where data may be directly accessed. As mentioned on the Related URLs wiki page, there are several sub-elements which are used to identify the purpose of the URL. For GET DATA links specifically, best practices for these elements include:
URL Content Type: The URL Content Type is a keyword which, at a high level, describes the content of a link. This is a controlled vocabulary field maintained as an enumeration list within the UMM-Common schema. For GET DATA URLs, the URL Content Type should always be "DistributionURL".
URL Type: The URL Type is a keyword which specifies the content of a link. URL Type keywords are maintained in the Keyword Management System (KMS). For GET DATA URLs, the URL Type should always be "GET DATA".
URL Subtype: The URL Subtype is a keyword which further specifies the content of a link. Together, the URL Type and Subtype keywords create a keyword hierarchy which is used to identify the URL. Providing a Subtype for GET DATA URLs is optional, but should be used when applicable. Valid Subtypes include (as of 7/25/2018): <insert list> Should we include examples/ a brief description of each subtype?Subtype keywords can be found in the URL Content Type keyword list. For access links, any Subtype keyword after GET DATA is a valid option.
Description: While not required, it is highly recommended that a description be provided for each URL provided in the metadata. The description should be kept brief and explain to the user that the link goes to a data access point. The descriptions should be unique to the link. While descriptions can be repeated for the same type of URL across different metadata records, it is generally advised that the same description not be repeated within the same metadata record. I.e. the description should be used to further differentiate two GET DATA URLs with the same URL Type and Subtype.
There are several sub-elements specifically designated for GET DATA URLs. The following provides definitions and best practices for each of the sub-elements:
RelatedUrls/GetData/Format: The format of the data provided via the associated URL. Format is an optional sub-field. Format is a controlled vocabulary field and should be chosen from the <insert GCMD data format keyword list>. If data is provided in a compressed file format (e.g. .zip, .tar), it is recommended that the format of the data once it is uncompressed be provided.
RelatedUrls/GetData/MimeType: The mime type of the associated URL. Mime Type is an optional sub-field. Mime Type is a controlled vocabulary field and should be chosen from the <insert GCMD mime type keyword list>.
RelatedUrls/GetData/Size: The size of the data file(s) to be downloaded via the provided URL. Size should only be provided if when clicking on the URL results in an immediate download. Size need not be provided if the user must perform additional tasks or queries after clicking on the URL, such as navigating through a data directory before downloading data (to select a particular file to download (since in this case, the size of the download may be variable). The purpose of this element is to inform a user users of the size of the data to that will be downloaded to their local work space if clicking they click on the data access URL results in an immediate download. This element only contains the numerical value of the size, not the unit. The unit (e.g. KB, MB, etc.) should be provided in the 'Unit' element.
RelatedUrls/GetData/Unit: The unit accompanying the size specified in the 'Size' element. Together, Size and Unit indicate the size of the data file(s) to be downloaded via the provided URL. Unit must be provided if 'Size' is provided. Size and Unit should only be provided if clicking on the URL results in an immediate download. Size and Unit need not be provided if the user must perform additional tasks or queries after clicking on the URL, such as navigating through a data directory before downloading data to select a particular file to download (since in this case, the size of the download may be variable). This element should only contain a unit e.g. Unit is a controlled vocabulary field and must be selected from the following options: KB, MB, GB, etcTB, PB.
RelatedUrls/GetData/Fees: The fee (if any) for ordering the data. The fee should be a number in U.S. dollars. This is an optional field.
RelatedUrls/GetData/Checksum: A value used to verify the integrity of a file or a data transfer, such as the integrity of a downloaded file. Checksum should only be provided if clicking on the URL results in an immediate download. The checksum provided in the metadata may be used to compare the original data file to the copy of the data downloaded to a user's local work space via the data access URL. The checksum provided in the metadata should be that of the original data file. The user is responsible for generating a checksum of the file on their local work space - if . If the checksum generated on their local work space does not match the checksum in the metadata (i.e. the original file), the data may have been altered or corrupted. Note that there are a variety of checksum algorithms available for use. Some examples include: MD5, CRC-8, CRC-16, CRC-32, Feltcher's checksum, Alder-32.
For NASA EOSDIS data sets:
- At least one GET DATA URL is required for all NASA NASA EOSDIS data sets.
- For NASA EOSDIS data, data access should be behind URS authentication.
- For NASA EOSDIS data, it is also recommended that data access data access should not be provided via FTP protocol, in favor of the HTTPS protocol.
Examples:
URL: https://hydro1.gesdisc.eosdis.nasa.gov/data/FLDAS/FLDAS_VIC025_C_EA_M.001/
URL Content Type: DistributionURL
URL Type: GET DATA
URL Subtype: DATA TREE
Description: Use the link to access the data via HTTPS. Files are organized by date.
Format: NetCDF-4
Mime Type: text/html
Fees: 0
URL: https://daac.ornl.gov/cgi-bin/download.pl?ds_id=465&source=dsviewer
URL Content Type: DistributionURL
URL Type: GET DATA
URL Subtype: DIRECT DOWNLOAD
Description: Downloads the NPP Boreal Forest: Canal Flats, Canada, 1984, R1 data set directly to your workstation.
Format: Text File
Size: 91.8
Unit: KB
Fees: 0
Checksum: f2aa78d6825ec42d783581a8d5ea1f 68