Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


CRF, at the bottom of the diagram, needs a special attention. The Cloud Raster Format is an Esri-created raster format that is optimized for writing and reading large files in a distributed processing and storage environment. We ran a few experiments to measure the performance of CRF and the performance of CRF is qutie amazing. There's only 0.4 second difference when CRF is put on S3 instead of local drive. This is quire quite remarkable because both THREDDS (e.g., TerraFusion Terra Fusion 600X slower Local vs. S3) and Hyrax (e.g., 90X slower EFS vs. S3) performs very poorly when data are served from S3.


  Terra Fusion is a NASA ACCESS 2015 project. Terra Fusion is an ultimate test dataset for the existing software for netCDF on cloud because file size is huge. Terra Fusion helped us to find issues in several open source projects like Hyrax, THREDDS, and GDAL. For example, GDAL alone can't handle TerraFusion Terra Fusion properly since netCDF swath handling needs improvement although GDAL can read data from S3 efficiently. SDT is necessary to read lat/lon and reproject swath to grid. We subsetted MODIS and created an aggregated netCDF. Then, we created a mosaic dataset directly from the aggregated netCDF. There are a lot of interesting technical details but the valuable lesson is that the meeting the CF conventions alone is not enough to make dataset fully interoperable with the current GDAL and ArcGIS.


  It's worth nothing the two extreme approaches that Terra Fusion and Sentinel-2 took for cloud and compare them.  What's convenient for atmospheric data scientists in supercomputer world super-computing environment may not work well for general public in cloud-computing environment. It's time to update NASA Data Producer's guide and Data-Interoperability-Working Group (DIWG) recommendations to addresses the issues when data are put into cloud. Data usage should drive the final delivery format on cloud. Sentinel-2's approach seems better than TerraFusion Terra Fusion but we believe that CRF would be more usable than CoG as analysis ready data format in cloud.
