We recommend that all HDF5 Earth Science product files be made netCDF4-compatible and CF-compliant within groups.
Recommendation Details:
Compatibility with netCDF4
Unlike netCDF3, netCDF4 is based on HDF5 and thus allows for the creation of group structures. Therefore, it makes sense to create group structures in netCDF4 directly, or within HDF5 products that can be read through the netCDF4 API. This can be achieved by adding dimension datasets and dimension scales that follow the netCDF data model to the HDF5 products.
Example: A dimension named Time
:
Time
is created at the root level with the required values.H5DSset_scale
function. This allows datasets below the root level to attach to the dimension dataset via the H5DSattach_scale
function.CF-Compliant Within Groups
The CF conventions are widely employed guidelines for Earth Science data and metadata storage. The purpose of the CF conventions is to require conforming datasets to contain sufficient metadata that they are self-describing in the following ways: Each variable in the file has an associated description of what it represents, including physical units if appropriate; and each value can be located in space (relative to Earth-based coordinates) and time. Thus, adhering to CF guidelines will increase completeness, consistency, and interoperability of conforming datasets.
Currently CF only applies to “flat files” with a single group, and not to files with multiple groups or hierarchical structures that typify modern NASA satellite datasets. Until CF is extended to apply to multiple groups, such NASA datasets can be most CF-compliant by following CF within each group.
To achieve the maximum CF-compliance within each group, we recommend the following:
coordinate
attribute that consists of the coordinate dataset names must be used for each corresponding dataset. To avoid ambiguity and to take advantage of some popular CF tools (Panoply, etc.) that already support HDF5/netCDF-4 files with multiple groups, the following is recommended to handle the coordinates:NAME
attribute of the dimension dataset to be "This is a netCDF dimension but not a netCDF variable."
temperature
that is under the group g2
, and whose parent group is g1
(i.e., float /g1/g2/temperature[Dim1][Dim2]
). The two-dimensional latitude and longitude fields that describe this temperature field is under the group g1
(i.e., /g1/latitude[Dim1][Dim2]
, /g1/longitude[Dim1][Dim2]
). One should define a coordinates attribute coordinates="/g1/latitude /g1/longitude"
.