Consider “balanced” chunking for 3-D datasets in grid structures

Recommendation:

We recommend that "balanced" chunking be considered for three-dimensional datasets in grid structures.

Recommendation Details: If a dataset is exceptionally large, it is often more useful to break it up into manageable parts. This process is known as chunking and is used on data in datasets that are part of a grid structure. Exactly how the data chunking is done can greatly affect performance for the end user. Because the precise access pattern employed by the end user is usually unknown until the distributor analyzes sufficient requests to discern a pattern, it is difficult to determine the most effective way to chunk.

However, there are some common ways of applying chunking that have been broadly successful. Among them is a method known as “balanced chunking,” which is chunking that balances access speeds for time-series and geographic cross-sections, the two most-common geometries of requested data.

For example, Unidata has an algorithm for balanced chunking. This and other chunking algorithms are implemented in NCO's ncks and described in greater detail in the NCO documentation. The HDF Group also provides some guidelines (https://www.hdfgroup.org/pubs/papers/2008-06_netcdf4_perf_report.pdf and https://www.hdfgroup.org/HDF5/doc/Advanced/Chunking/index.html).

Space shortcuts

Page tree