I need to uniquely identify datasets as well as platforms, instruments, software, and other associated resources. |
The need for identifiers in metadata records was first recognized in the DIF Standard and FGDC Remote Sensing Extensions. These standards introduced identifiers for the metadata records. In ISO 19115 this role is addressed by the fileIdentifier, a character string included in the MD_ or MI_Metadata object. This character string has been replaced with an MD_Identifier in 19115-1.
Including fileIdentifiers in the ISO metadata records gives metadata creators a mechanism for uniquely identifying them. This is becoming more important as metadata records evolve from single files into collections of related objects that can be harvested into repositories like geo.data.gov along multiple paths. There is no reliable way to identify duplicate records without a unique identifier in the actual record.
If the metadata records belongs to a parent metadata collection the parentIdentifier field can be used to reference the parent collection.
Identifiers are also used to reference resources associated with the data set or service described by the metadata. For example platforms, instruments, software, documentation, reports, partners and products can all be unambiguously referenced and described with the MD_Identifier object.
Digital Object Identifiers are most commonly used to identify and cite published datasets. In the ISO standard these identifiers should be included as an MD_Identifier in the CI_Citation for the dataset. If the metadata record itself also had a DOI, that would be in the fileIdentifier.
As DOIs become more ubiquitous, the prefix doi: is becoming a standard internet protocol. This means that browsers and other tools will know that the string doi:10.5067/MEASURES/DMSP-F8/SSMI/DATA302 means the same thing as the URL: http://dx.doi.org/10.5067/MEASURES/DMSP-F8/SSMI/DATA302. As this becomes more common, it addresses the problem of identifiers with no straightforward mechanism for resolution.
|
Identifiers occur in many places in the ISO standard. The identifiers in Citations are particularly important because Citations also occur in many locations throughout the standard.
Usage | Description and Xpath |
---|---|
Quality Measure Identifier | Provides identification information for a data quality measure, such as 'Percent Missing Data'. /gmi:MI_Metadata/gmd:dataQualityInfo/gmd:MD_DataQuality/gmd:report/gmd:DQ_Element /gmd:measureIdentification |
Objective Identifier | Provides identification information for the operation objective. /gmi:MI_Metadata/gmd:acquisitionInformation /gmi:MI_AcquisitionInformation/gmi:objective/gmi:MI_Objective/gmi:citation |
There are many cases in the ISO Standard where CI_Citations and MD_Identifiers are used together to reference and identify external resources. We term these (CI_Citation+MD_Identifier)++:
Usage | Xpath and Description |
---|---|
Aggregate Citation and Identifier | (CI_Citation + MD_Identifier) + associationType + initiativeType = MD_AggregationInformation /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification | srv:ServiceIdentification/gmd:aggregationInfo/gmd:MD_AggregateInformation/gmd:aggregateDataSetName |
Software Citation and Identifier | (CI_Citation + MD_Identifier) + description + scaleDenominator + sourceReferenceSystem + sourceExtent + processedLevel + resolution + sourceStep = LE_Source /gmi:MI_Metadata/gmd:dataQualityInfo/gmd:MD_DataQuality /gmd:lineage/gmd:LE_Lineage/gmd:processStep /gmd:LE_ProcessStep/gmd:processingInformation /gmd:LE_Processing /gmd:softwareReference |
Operation Citation and Identifier | (CI_Citation + MD_Identifier) + description + status + type = MI_Operation /gmi:MI_Metadata/gmd:acquisitionInformation /gmi:MI_AcquisitionInformation/gmi:operation /gmi:MI_Operation/gmi:citation |
Instrument Citation and Identifier | (CI_Citation + MD_Identifier) + description + type = MI_Instrument /gmi:MI_Metadata/gmd:acquisitionInformation /gmi:MI_AcquisitionInformation /gmi:instrument/gmi:MI_Instrument/gmi:citation |
Platform Citation and Identifier | (CI_Citation + MD_Identifier) + description + sponsor = MI_Platform /gmi:MI_Metadata/gmd:acquisitionInformation /gmi:MI_AcquisitionInformation /gmi:platform/gmi:MI_Platform/gmi:citation |
Requirement Citation and Identifier | (CI_Citation + MD_Identifier) + requestor + recipient + priority = MI_Requirement /gmi:MI_Metadata/gmd:acquisitionInformation /gmi:MI_AcquisitionInformation /gmi:requirement/gmi:MI_Requirement/gmi:citation |
One of the ironic aspects of the ISO 19115 is that the identifiers for metadata records (/gmi:MI_Metadata/gmd:fileIdentifier/gmd:characterString and /gmi:MI_Metadata/gmd:parentIdentifier/gmd:characterString) are characterStrings rather than MI_Identifiers. In order to help ensure uniqueness these strings should include a namespace and a code guaranteed to be unique in that namespace. For example:
<gmd:fileIdentifier> <gco:CharacterString>gov.noaa.class:AERO100</gco:CharacterString> </gmd:fileIdentifier>.
In this case, gov.noaa.class is a namespace, and AERO100 is a code guaranteed to be unique in that namespace. In this case, the code is meaningful to the data provider. Creating meaningful identifiers that are unique over a large collection can many times be difficult. It might make sense to consider using UUIDs for file names and identifiers, although this takes some getting used to.
ISO 19115-1 overcomes this challenge by changing the type of fileIdentifiers and parentIdentifiers to MD_Identifiers, see Structure section above.
Should the fileIdentifier match the file name? There is no rule in ISO that specifies a relationship between the fileIdentifier and the file name. It is, however, very convenient to have the file name available from within the record, particularly for supporting access to the file name when transforming the XML into HTML.
Including identifiers in ISO metadata records gives metadata creators a mechanism for uniquely identifying metadata records and pieces of those records for the first time. The importance of unique identifiers is well known to people that use relational database management systems, they are the primary keys that identify items and make relationships possible. This is also becoming more important as metadata records are harvested into repositories like Geospatial One-Stop along multiple paths. There is no reliable way to identify duplicate records without an identifier in the actual record.
|