Recommendation:

The fill value of a variable should be a number outside its valid data range.

Recommendation Details: The CF _FillValue attribute is used to indicate missing or invalid data for a variable.  Also, the value of the CF _FillValue attribute should match the actual fill value used for the variable in the file.

The value of the CF _FillValue attribute should be a mathematically valid number that lies outside the valid range for a variable.  Please note that NaN (Not-a-Number) is neither a number nor is it mathematically valid, and, thus, should not be used as the fill value (see Recommendation 3.7 of ESDS-RFC-036).

If possible, using zero as the fill value should be avoided, because zero looks too much like a physically realistic value, and this can be confusing to the end users.

There should only be one fill value per variable.  We recommend using a quality flag variable along with the CF flag_values and flag_meanings attributes to explain the various reasons for using the fill value, instead of using several special values in the variable.

Awaiting ESO Approval

This recommendation has been finalized by DIWG but has not yet received final ESO approval.

13 Comments

  1. https://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html#classic_format_spec

    padding      = <0, 1, 2, or 3 bytes to next 4-byte boundary>
                                      // Header padding uses null (\x00) bytes.  In
                                      // data, padding uses variable's fill value.
                                      // See "Note on padding", below, for a special
                                      // case.
                                      // Default fill values for each type, may be
                                      // overridden by variable attribute named
                                      // '_FillValue'. See "Note on fill values",
                                      // below.
         FILL_CHAR    = \x00                      // null byte
         FILL_BYTE    = \x81                      // (signed char) -127
         FILL_SHORT   = \x80 \x01                 // (short) -32767
         FILL_INT     = \x80 \x00 \x00 \x01       // (int) -2147483647
         FILL_FLOAT   = \x7C \xF0 \x00 \x00       // (float) 9.9692099683868690e+36
         FILL_DOUBLE  = \x47 \x9E \x00 \x00 \x00 \x00 \x00 \x00 //(double)9.9692099683868690e+36
    1. SiriJodha,

      I'm interested in seeing how "0–100: NDSI snow cover" is specified in the data files, it's a value range instead of a single value.

      Thanks,

      Yaxing

  2. Perhaps remove the "There should only be one fill value per variable" paragraph, if we cannot agree.

    1. The context for "only one fill value per variable" is special values.  There can only be one _FillValue per variable.  An integer (or bit) flag variable can be used to explain the reasons for the use of the fill value.  Alternatively, several special values can be used to explain various no-data cases, but special values should not be confused with _FillValue - there is only one _FillValue per variable in HDF5 and netCDF4.

  3. NDSI_Snow_Cover has both science values and a bunch of "fill" values with special meaning

    1. I would use the term "special value" for "fill values with special meaning".  I would prefer to reserve the term "fill value" for _FillValue only.

  4. If there are more than 1 missing value, there are two options:

    1. Put flag_values and flag_meanings attributes directly into the data variable and use those two attributes to specify multiple missing values
    2. Create a separate flag variable that's linked with the data variable, put the flag_values and flag_meanings attributes into the flag variable to specify multiple missing values 

    Option 2 is consistent with the usage examples of flag_values given in the CF convention, but it introduces an extra variable, which means file size will increase.

    Option 1 is more efficient regarding file size, but this usage needs to be discussed with the CF team to ensure that's a proper usage of the flag_values.


  5. I always thought mixing flag values with science values was a bad idea, as in the MODIS example. It came from a time when file size was a major driver of product design. Makes plotting a hassle.  

    Come to think of it now, I would suggest we add a recommendation to the effect - use only one fill value in a science array and put all conditions leading to the missing value in a separate quality array. Note, however, that in the MODIS case, there is already a quality array with other information, some of it redundant with the special values in the science arrays.  In all, a good example of bad design.

  6. I am unsure about the inclusion of the following statement in this recommendation:  "We recommend using a quality flag variable along with the CF flag_values and flag_meanings attributes to explain the various reasons for using the fill value, instead of using several special values in the variable."  Not sure I understand how the QF attribute options available allow one to describe why a particular _FillValue was selected.

    Could perhaps another part of the recommendation also be to use a consistent _FillValue for all variables within a given file/dataset?

    Sorry, just reviewing the finalized verbiage of these recommendations now in preparation for the voting.

    1. I agree that the wording referencing the use of CF flag_values and flag_meanings in very confusing and unnecessary. 

      1. I'm not sure why we're mixing flag attributes into the fill value attribute recommendation. To me if a variable represents a flag it should use the CF flag_values and flag_meanings attributes. Any fill value for flags should be outside the flag values range, no different than if it was a regular variable. The only difference is if you have a temperature variable you don't need to define the meaning of each value, as the units attribute should tell you what each variables values is.  A flag can't do that, as its a flag, and thus you must define the  meaning for each flag value.

  7. fill value is the value that the array gets padded with when the array is first created. Most often fill is the same value assigned to a missing value, though I don't think it should have to be necessarily. For example you create an array and it uses a projection where certain grid cells will never ever have a value. Those would get set with the fill value.  If the retrieval can't compute a value at some location, that gets set to a missing value.

    Also, HDF5 library sets fill using the H5D properties during creation, and can never be changed again, it may set an attribute FillValue (via netCDF libaray) and attach that to the variable (or HDF5 Dataset). The attribute missing_value is attached to a variable via netCDF or HDF5 library, and could be changed later.