...
Why ArcGIS? Esri's ArcGIS Enterprise is an amazing all-inclusive turn-key solution for NASA Earthdata in both cloud and on-premise systems. It can cover the most basic services that ESDIS EOSDIS provides and can go beyond to maximize user experience.
We investigated the most commonly used workflows that can be applied to NASA CUMULUS.
DAACs play the critical role in interacting with real data users and they care most about UX.
...
ASDC infused our technology and created several image services in on-prem systems.
Although our focus was ASDC products, our workflows can be easily extended to other DAACs. We successfully collaborated with ASF and GSFC.
A good UX hinges on the availability of Analysis-Ready-Data (ARD) that can quickly deliver data that user needs. We measured the performance of ARD candidates and found out that CRF can deliver the best UX in ArcGIS. Therefore, we have developed several transformation workflows that are built around the latest GDAL.
Through BEDI, GDAL Enhancement for ESDIS (GEE) project identified an issue in handling multi-dimensional dataset and made patches. When the GEE team made a pull request after clearing NOSA, GDAL community reviewed the requests and created a new RFC 75 to generalize our patch work further. The RFC was discussed, approved, and fully implemented in GDAL 3.1.
...
Any NASA HDF data product that has transposed X and Y dimension can benefit from the new GDAL 3.1 capability. This new capability is already tested through SDT project and proven to work in cloud as well. For example, creating GeoTIFF image from MOPITT 5D dataset in a large (~30G) TERRA FUSION granule on AWS S3 was possible through the new gdalmdimtranslate command line tool.
We investigated the most commonly used workflows that can be applied to NASA CUMULUS. The workflows are illustrated in the diagram as arrows. We've developed 20 AWS Lambda functions that correspond to the arrows. When they are put together and run in steps, they can enable image services for both ArcGIS Server and GeoServer. They can be easily deployed via Serverless Framework.
In the above diagram, please note that you can create mosaic dataset or CRF directly from an aggregated, CF-compliant netCDF along time dimension.
CRF, at the bottom of the diagram, needs a special attention. The Cloud Raster Format is an Esri-created raster format that is optimized for writing and reading large files in a distributed processing and storage environment. We ran a few experiments to measure the performance of CRF and the performance of CRF is qutie amazing. There's only 0.4 second difference when CRF is put on S3 instead of local drive. This is quite remarkable because both THREDDS (e.g., Terra Fusion 600X slower Local vs. S3) and Hyrax (e.g., 90X slower EFS vs. S3) performs very poorly when data are served from S3.
Recently, we learned that Sentinel-2 data are available from AWS Open Data Registry. We wanted to know if we can replicate what Esri did with Sentinel-2. We tested CoG files for Aerosol Optical Thickness (AOT) with Lambda functions that we've developed for this project. It took only a day's effort to create an image service out of Sentinel-2 CoG data. This experiment validates that our approach can be easily adapted by NASA DAACs and end-users. It would be great if NASA Earthdata provides an authoritative ArcGIS Image Service, not Esri.
Terra Fusion is a NASA ACCESS 2015 project. Terra Fusion is an ultimate test dataset for the existing software for netCDF on cloud because file size is huge. Terra Fusion helped us to find issues in several open source projects like Hyrax, THREDDS, and GDAL. For example, GDAL alone can't handle Terra Fusion properly since netCDF swath handling needs improvement although GDAL can read data from S3 efficiently. SDT is necessary to read lat/lon and reproject swath to grid. We subsetted MODIS and created an aggregated netCDF. Then, we created a mosaic dataset directly from the aggregated netCDF. There are a lot of interesting technical details but the valuable lesson is that the meeting the CF conventions alone is not enough to make dataset fully interoperable with the current GDAL and ArcGIS.
It's worth nothing the two extreme approaches that Terra Fusion and Sentinel-2 took for cloud and compare them. What's convenient for atmospheric data scientists in super-computing environment may not work well for general public in cloud-computing environment. It's time to update NASA Data Producer's guide and Data-Interoperability-Working Group (DIWG) recommendations to addresses the issues when data are put into cloud. Data usage should drive the final delivery format on cloud. Sentinel-2's approach seems better than Terra Fusion but we believe that CRF would be more usable than CoG as analysis ready data format in cloud.
...
MDCS can publish service to Server in a specific folder but cannot to Portal in a specific folder. Use ArcPy to move the service to a specific folder in Portal.
Build CRF and serve it via ArcGIS Server for popular dataset. Although building CRF may take extra time and CRF takes up extra storage, it's worth creating for the best user experience.
...
The following sections list new features that are desirable in near future. Cloud technology has disrupted many traditional technologies and closed the doors of many high-tech companies (e.g., MapR, Hortonworks). This trend will only accelerate as cloud vendors provide more COTS and Data as a Service. For example, do we still need OPeNDAP to subset and deliver data over network while S3 can do it faster and in much larger scale? We will provide some insights for cloud-era.
Attachments |
---|