This section contains a series of recommendations made by NASA's Earth Science Data System Working Groups (ESDSWG) Dataset Interoperability Working Group (DIWG) that are meant to increase and enhance the interoperability of Earth Science data product files. The DIWG recommendations herein embody best practices to reduce and bridge gaps between geoscience dataset formats widely used at NASA and elsewhere and to help ensure that Earth science datasets smoothly interoperate with each other regardless of their origin.
The first document, Dataset Interoperability Recommendations for Earth Science, was published in July 2016 and contains 12 recommendations. The second, Dataset Interoperability Recommendations for Earth Science: Part 2, was published in April 2019. These are a continuation of the 2016 recommendations with the same goal of improving the interoperability of Earth Science dataset files. Most cover new areas of interoperability while some expand on the 2016 recommendations.
We recommend that packing attributes (i.e., scale_factor
and add_offset
) be employed only when data are packed as integers.
Recommendation Details: Packing refers to a lossy means of data compression that typically works by converting floating point data to an integer representation that requires fewer bytes for storage. The packing attributes scale_factor
and add_offset
are the netCDF (and CF) standard names for the parameters of the packing and unpacking algorithms. If scale_factor
is 1.0 and add_offset
is 0.0, the packed value and the unpacked value are identical, although their datatype (float or integer) may differ. Unfortunately, many datasets annotate floating point variables with the attributes, apparently for completeness, even though the variables have not been packed and remain as floating point values. Incorporating packing attributes on data that have not been packed is a misuse of the packing standard and it should be avoided. Data analysis software that encounters packing attributes on data that are not packed is liable to be confused and perform in unexpected ways. Packed data must be represented as integers, and only integer types should have packing attributes.
We recommend Earth Science data products avoid using Not-a-Number (NaN) in any field values or as an indicator of missing or invalid data.
Recommendation Details: The Institute of Electrical and Electronics Engineers (IEEE) floating-point standard defines the NaN (Not-a-Number) bit-patterns to represent results of illegal or undefined operations. Unless carefully written, any arithmetic operation involving NaN values can halt a program. Furthermore, any relational operator with at least one NaN value operand must evaluate to False. These properties make NaN values difficult to handle in numerical software and reduce the interoperability of datasets that contain NaN.
We recommend using standardized file name extensions for HDF5 and netCDF files, as follows:
.h5
for files created with the HDF5 API;.nc
for files created with the netCDF API; and
We recommend that datasets with non-netCDF packing be clearly distinguished from datasets that use the netCDF packing convention.
Recommendation Details: Earth Science observers and modelers often employ a technique called “packing” (a.k.a. “scaling’) to make their product files smaller. "Packed" datasets must be correctly "unpacked" before they can be used properly. Confusingly, non-netCDF (e.g., HDF4_CAL) and netCDF algorithms both store their parameters in attributes with the same or similar names – and unpacking one algorithm with the other will result in incorrect conversions. Many netCDF-based tools are equally unaware of the non-netCDF (e.g., HDF_CAL) packing cases and so interpret all readable data using the netCDF convention. Unfortunately, few users are aware that their datasets may be packed, and fewer know the details of the packing algorithm employed. This is an interoperability issue because it hampers data analysis performed on heterogeneous systems.
We recommend that all HDF5 Earth Science product files be made netCDF4-compatible and CF-compliant within groups.
Recommendation Details:
We recommend that user-defined group, variable, and attribute names follow the Climate and Forecast (CF) convention's specification. The names shall comply with this regular expression: [A-Za-z][A-Za-z0-9_]*
. Exempt are system-defined names for any of these objects that are required by various APIs or conventions.