Recommendation:

A space-separated list of documentation DOIs should be used in the CF references attribute in Earth Science data products, both globally and for specific variables.

Recommendation Details: The CF references attribute is useful for storing information regarding documentation in Earth Science data products.  The CF references attribute can exist both globally and at the variable level.  The most concise way to reference a document is via its DOI.  We suggest that a space-separated list of documentation DOIs should be used in the CF references attribute in Earth Science data products.  Use of the URL form of the DOI is strongly recommended.  Also, URLs of relevant documents that do not have DOIs can be used in the CF references attribute.

Awaiting ESO Approval

This recommendation has been finalized by DIWG but has not yet received final ESO approval.

10 Comments

  1. I have an issue with this recommendation.  A dataset should have a single unique DOI that points to the landing page of the repository that hosts the data. There can also be DOIs to publications, etc., but if more than one appear in a string of the "references" attribute, a user has to look at each one to find the authoritative one. The right place for literature references would be on the repository landing page that is referenced in a single DOI.

    This brings me back to the issue I have with loading granules up with a lot of collection metadata.  All that information should be available via the landing page, pointed to by a single DOI.  Dataset DOIs are the norm now, an essential facet of making data FAIR. As long as a granule carries a DOI it doesn't have to carry a bunch of collection metadata. What's important is having metadata that's needed to work with the data like scaling, units, fill value, etc.

    1. This seems like an issue where it would be reasonable to allow users to choose. I take your point about a single DOI pointing to a reference list on the dataset landing page, but does this represent a single point of failure?  In terms of finding relevant information, a user looking at a list of references on a webpage might still have to go through several of them to find the information they want.  Since the references attribute can be defined at variable level, a well-structured file might actually speed up the process - it really depends how the producer structures their files and landing page. I'm not sure either option has decisive advantages, so to me the recommendation seems fine as is.

      1. There certainly could be more than one reference for individual variables.

    2. To have a perfect DOI landing page that includes all possible references would be great, but I've never seen one.

      To have the flexibility to include multiple references at the global level is a good thing, and we are recommending precisely how this should be done.

  2. see https://github.com/cf-convention/cf-conventions/issues/160

    there needs to be a standard place to put the unique dataset DOI into the metadata.  Fine to list other documents as a list in "references"

  3. This may be over-thinking it, but I think commas are allowed as characters in DOIs (and URLs) whereas the DOI handbook (section 2.5.2.4) states that it is mandatory for space characters to be hex encoded. Possibly a reason to use a space-separated rather than comma-separated list?

    1. I think the only special characters allowed in DOIs are colons, slashes and periods.  Here are two examples:  doi:10.1000/182 and https://doi.org/10.1000/182 .

      I think you may be referring to a full reference that includes a DOI, which can include commas, such as the example described on this Web page:  https://bowvalleycollege.libguides.com/apa-style/article-doi .

      So we need to clarify what we mean by "Referencing Documentation Using DOIs".

      Perhaps we should change the recommendation title to "Using DOIs in the references Attribute", or something like that.

      It would be better to use an array of strings for the references attribute, so that we would not have to worry about separator characters, but I do not believe string arrays have been implemented yet for attributes.


      1. Having a valid, pre-constructed URL containing the DOI is more useful than just the DOI itself, such as in this example above provided by Peter Leonard .

  4. One practical constraint to implementation of this recommendation is that documentation and their DOIs only become available AFTER a dataset has been processed and the granules produced.  I think the statement "when possible" should be included.

  5. The references file attribute is for publications that the science team recommends a user read (validation papers, papers describing the product, the instrument, etc.). Its not really intended for the data product DOI as that would be circular, since the user already has the product. The data product DOI has its own file attribute.