Use a Number Outside of the Valid Data Range for a Variable's Fill Value

Created by Peter Leonard, last modified by Steve Olding on Nov 16, 2020

Recommendation:

The fill value of a variable should be a number outside its valid data range.

Recommendation Details: The CF _FillValue attribute is used to indicate missing or invalid data for a variable. Also, the value of the CF _FillValue attribute should match the actual fill value used for the variable in the file.

The value of the CF _FillValue attribute should be a mathematically valid number that lies outside the valid range for a variable. Please note that NaN (Not-a-Number) is neither a number nor is it mathematically valid, and, thus, should not be used as the fill value (see Recommendation 3.7 of ESDS-RFC-036).

If possible, using zero as the fill value should be avoided, because zero looks too much like a physically realistic value, and this can be confusing to the end users.

There should only be one fill value per variable. We recommend using a quality flag variable along with the CF flag_values and flag_meanings attributes to explain the various reasons for using the fill value, instead of using several special values in the variable.

Awaiting ESO Approval

This recommendation has been finalized by DIWG but has not yet received final ESO approval.

13 Comments

SiriJodha Khalsa

https://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html#classic_format_spec

padding      = <0, 1, 2, or 3 bytes to next 4-byte boundary>
                                  // Header padding uses null (\x00) bytes.  In
                                  // data, padding uses variable's fill value.
                                  // See "Note on padding", below, for a special
                                  // case.

                                  // Default fill values for each type, may be
                                  // overridden by variable attribute named
                                  // '_FillValue'. See "Note on fill values",
                                  // below.
     FILL_CHAR    = \x00                      // null byte
     FILL_BYTE    = \x81                      // (signed char) -127
     FILL_SHORT   = \x80 \x01                 // (short) -32767
     FILL_INT     = \x80 \x00 \x00 \x01       // (int) -2147483647
     FILL_FLOAT   = \x7C \xF0 \x00 \x00       // (float) 9.9692099683868690e+36
     FILL_DOUBLE  = \x47 \x9E \x00 \x00 \x00 \x00 \x00 \x00 //(double)9.9692099683868690e+36

Permalink

Nov 20, 2019

SiriJodha Khalsa
https://nsidc.org/data/MOD10A1/versions/6
table 3
- Permalink
- Nov 20, 2019
1. Yaxing Wei
  SiriJodha,
  I'm interested in seeing how "0–100: NDSI snow cover" is specified in the data files, it's a value range instead of a single value.
  Thanks,
  Yaxing
  Permalink
  
  Nov 20, 2019
Peter Leonard
Perhaps remove the "There should only be one fill value per variable" paragraph, if we cannot agree.
- Permalink
- Nov 20, 2019
1. Peter Leonard
  The context for "only one fill value per variable" is special values. There can only be one _FillValue per variable. An integer (or bit) flag variable can be used to explain the reasons for the use of the fill value. Alternatively, several special values can be used to explain various no-data cases, but special values should not be confused with _FillValue - there is only one _FillValue per variable in HDF5 and netCDF4.
  Permalink
  
  Nov 21, 2019
SiriJodha Khalsa
NDSI_Snow_Cover has both science values and a bunch of "fill" values with special meaning
- Permalink
- Nov 20, 2019
1. Peter Leonard
  I would use the term "special value" for "fill values with special meaning". I would prefer to reserve the term "fill value" for _FillValue only.
  Permalink
  
  Nov 21, 2019
Yaxing Wei
If there are more than 1 missing value, there are two options:
1. Put flag_values and flag_meanings attributes directly into the data variable and use those two attributes to specify multiple missing values
2. Create a separate flag variable that's linked with the data variable, put the flag_values and flag_meanings attributes into the flag variable to specify multiple missing values
Option 2 is consistent with the usage examples of flag_values given in the CF convention, but it introduces an extra variable, which means file size will increase.
Option 1 is more efficient regarding file size, but this usage needs to be discussed with the CF team to ensure that's a proper usage of the flag_values.
- Permalink
- Nov 20, 2019
SiriJodha Khalsa
I always thought mixing flag values with science values was a bad idea, as in the MODIS example. It came from a time when file size was a major driver of product design. Makes plotting a hassle.
Come to think of it now, I would suggest we add a recommendation to the effect - use only one fill value in a science array and put all conditions leading to the missing value in a separate quality array. Note, however, that in the MODIS case, there is already a quality array with other information, some of it redundant with the special values in the science arrays. In all, a good example of bad design.
- Permalink
- Nov 25, 2019
Vardis Tsontos
I am unsure about the inclusion of the following statement in this recommendation: "We recommend using a quality flag variable along with the CF flag_values and flag_meanings attributes to explain the various reasons for using the fill value, instead of using several special values in the variable." Not sure I understand how the QF attribute options available allow one to describe why a particular _FillValue was selected.
Could perhaps another part of the recommendation also be to use a consistent _FillValue for all variables within a given file/dataset?
Sorry, just reviewing the finalized verbiage of these recommendations now in preparation for the voting.
- Permalink
- Apr 13, 2020
1. Ed Armstrong
  I agree that the wording referencing the use of CF flag_values and flag_meanings in very confusing and unnecessary.
  Permalink
  
  Apr 15, 2020
  1. James Johnson
    I'm not sure why we're mixing flag attributes into the fill value attribute recommendation. To me if a variable represents a flag it should use the CF flag_values and flag_meanings attributes. Any fill value for flags should be outside the flag values range, no different than if it was a regular variable. The only difference is if you have a temperature variable you don't need to define the meaning of each value, as the units attribute should tell you what each variables values is. A flag can't do that, as its a flag, and thus you must define the meaning for each flag value.
    
    Permalink
    
    Apr 15, 2020
James Johnson
fill value is the value that the array gets padded with when the array is first created. Most often fill is the same value assigned to a missing value, though I don't think it should have to be necessarily. For example you create an array and it uses a projection where certain grid cells will never ever have a value. Those would get set with the fill value. If the retrieval can't compute a value at some location, that gets set to a missing value.
Also, HDF5 library sets fill using the H5D properties during creation, and can never be changed again, it may set an attribute FillValue (via netCDF libaray) and attach that to the variable (or HDF5 Dataset). The attribute missing_value is attached to a variable via netCDF or HDF5 library, and could be changed later.
- Permalink
- Apr 15, 2020

Space shortcuts

Page tree

13 Comments

SiriJodha Khalsa

SiriJodha Khalsa

Yaxing Wei

Peter Leonard

Peter Leonard

SiriJodha Khalsa

Peter Leonard

Yaxing Wei

SiriJodha Khalsa

Vardis Tsontos

Ed Armstrong

James Johnson

James Johnson