We've developed a systematic data transformation methodology to enable Web Coverage Services (WCS) and ArcGIS Image Services within ESDIS Cumulus Cloud to increase usability/interoperability of data in commonly used geospatial tools. Many NASA Earthdata products are distributed in scientific data archiving formats like netCDF and HDF. However, such data may not be easily consumed by end users who use GIS tools that can't open or visualize data properly on maps. Our project goal is to narrow the gap by rapidly enabling image services that can quickly feed scientific data in images to GIS users through the use of cloud technology. For cloud technology, we focused on the use of EC2 instances for image services, API Gateways for user facing interface, AWS Lambda functions for data transformation, and S3 object storage system for data input and output because they are core parts of ESDIS CUMULUS.
This document points to How-to articles that point to specific code examples that are developed for last 2 years. If you find any issue in our document or software, please submit a ticket.
This document will provide insight for both DAACs and end users who want to transform NASA Earthdata formats (e.g., HDF, NetCDF ) into different analysis-ready data (ARD) formats and geospatial data services using technologies that can effectively run in the Amazon Web Services (AWS) cloud computing environment.
The successful user experience for NASA Earthdata is about meeting a data consumer's needs on individual level. For example, mobile app users will look for data that will fit in their tiny memory and space with streaming service. Alexa users will look for data can be delivered by voice. Cloud users will look for data that can be analyzed easily with commercial-off-the-shelf (COTS) solutions provided by cloud service vendor. Therefore, systematic data transformation of the existing data is necessary to enhance the user's experience and transforming scientific data into image data is one possibility to enhance user's experience.
Why image? Image is the format that everyone, including AI/ML, can enjoy! That's the main reason why NASA Earth Observatory publishes the image of day. You can engage not only general public but also students who can become future citizen Earth scientists through gamification of NASA Earthdata.
Hurricane Sally Image from ASF | Minecraft Map from Hurricane Sally Image |
---|---|
Why ArcGIS? Esri's ArcGIS Enterprise is an amazing all-inclusive turn-key solution for NASA Earthdata in both cloud and on-premise systems. It can cover the most basic services that EOSDIS provides and can go beyond to maximize user experience.
We investigated the most commonly used workflows that can be applied to NASA CUMULUS. We've developed 20 AWS Lambda functions that can enable image services. They can be easily deployed via Serverless Framework.
ASDC infused our technology and created several image services in on-prem systems.
Although our focus was ASDC products, our workflows can be easily extended to other DAACs. We collaborated with ASF and GSFC.
Through BEDI, GDAL Enhancement for ESDIS (GEE) project identified an issue in handling multi-dimensional dataset and made patches. When the GEE team made a pull request after clearing NOSA, GDAL community reviewed the requests and created a new RFC 75 to generalize our patch work further. The RFC was discussed, approved, and fully implemented in GDAL 3.1.
The large GDAL community now can easily access and transform arbitrary N-dimensional dataset that can be found in netCDF and HDF. The new GDAL 3.1 multidimensional APIs and tools also supports group hierarchy so users can unambiguously extract and subset data.
Any NASA HDF data product that has transposed X and Y dimension can benefit from the new GDAL 3.1 capability. This new capability is already tested through SDT project and proven to work in cloud as well. For example, creating GeoTIFF image from MOPITT 5D dataset in a large (~30G) TERRA FUSION granule on AWS S3 was possible through the new gdalmdimtranslate command line tool.
Data transformation can be done independently without Esri ArcGIS software such as ArcPy. Using the latest GDAL python or CLI is recommended for transformation if ArcPy lacks what the latest GDAL can provide.
Some ArcPy functions need Portal Signin capability. Therefore, installing ArcGIS Portal and federating it with ArcGIS Server is recommended.
In AWS environment, web proxy installation can be skipped and native AWS web service front-end can be used.
If input data source is not on S3 through CUMULUS, consider using OPeNDAP to subset data. Use NcML to modify some attributes and overwrite variable to make data CF-compliant.
If ArcGIS Pro on Windows is used for creating mosaic dataset, working with a large input table is fine.
If ArcPy and MDCS on Linux Windows Server is used, there's a limit (99 rows) in processing input table for creating mosaic dataset. Input source doesn't matter. Neither CSV nor netCDF works.
If MDCS fails, export the input table into CSV and modify header to standardize the key field names such as StdZ, StdTime. Then, try ArcGIS Pro to create a large mosaic dataset using the CSV file.
If AWS Lambda will be used for data transformation, make sure that data size is small. Data transfer & transformation should not take longer than 15 minutes.
Creating input table on AWS RDS is recommended for parallel / asynchronous data transformation through Lambda. Make sure that table has unique key to avoid storing duplicate entries.
Convert any mosaic dataset into CRF to optimize service performance especially on cloud. It will cost extra for storage but use it to improve user experience.
MDCS can publish service to Server in a specific folder but cannot to Portal in a specific folder. Use ArcPy to move the service to a specific folder in Portal.
The following sections list new features that are desirable in near future.