Vision for Legacy HDF (.he4), SMAP and ICESat-2 migration to the Cloud
Possible Feature Roadmap - Highly subjective, Order/Content subject to change.
(Essentially this is a "gap" analysis - what needs to be provided in cloud services to cover what is available today in on-prem services).
- General enable/configure/test Main service per Datasets
- Trajectory Subsetter - Available as a cloud data service and mostly pre-configured for SMAP & ICESat-2. Should only require metadata connections.
- HOSS - Minor updates for SMAP, ICESat-2 special cases (e.g., TRT-290 - HOSS support for SMAP L3)
- Missing CF for SMAP (configuration files already accommodate, mostly)
- Note e.g., HOSS (OPeNDAP) produces NetCDF-4 output, even for HDF-5 inputs, but .nc4 is built upon .h5. Likely we will end up with .nc4 as default output format, and as a preferred format over "plain" .h5 in all cases. "Plain" .h5 (native) for processed outputs is not available.
- Lower-Left Orientation - is a known issue, not currently supported in HOSS.
- Edge aligned dimension scales - also a known issue.
- SMAP-to-GeoTIFF - Special case in Trajectory Subsetter for SMAP data,
- Could be - just needs enabling existing code for In-Cloud deployment
- However, it is likely we will recommend a revised implementation - SMAP to Gridded HDF-5 format + Common HDF-5 to GeoTIFF/other (see following) - SMAP_L2_Grid.
- Harmony-GDAL-Formatter - GeoTIFF/COG outputs - ICESat-2 & SMAP (L3/L4, .h5 Format) inputs
- Refactor HGA - split archive, 2 or 3 separate services - HGF, HGA, HGAgg, (TRT-213 - Images for GITC?)
- COG formatter - GeoTIFF source (1st?), NetCDF source (2nd?)
- See HGA Roadmap
- L1/L2 Formats (mostly custom - port on-prem code to in-cloud):
- Shapefile (Vector data, ESRI-based file format, Custom code exists, GDAL?)
- NetCDF (Custom upgrade .h5 to .nc4 with CF support, L1/L2, port to in-cloud)
- ASCII (CSV, Tabular, other? - Custom code exists, GDAL?)
- kmz, kml (GDAL + custom front-end) - requirements?
- (Possible Trajectory Subsetter rewrite - Python - as a maintenance issue)
- L3/L4 Reprojection - PyResample and/or GDAL based?
- HGA-Reprojection? (SMAP focus, currently GDAL based, but not clear if GDAL specific option is best path)
- Projection Gridding for Trajectory Data (based on Swath-Projector, e.g. for ICESat-2) ?
- Minor fix for 1D vs 2D inputs
- Not specifically an on-prem feature, but from use-case analysis - an opportunity
- Legacy HDF-4 file handling (HDF-4 to NetCDF/CF converter as pre-step, OPeNDAP?)
Feature Summary:
- L1/L2 - Trajectory Subsetter - SMAP special case - Gridded outputs
- Enable in existing code base for Cloud deployment
- Gridded outputs to .h5 (Cropped grid, Revive un-merged branch? Not-GeoTIFF as on-prem)
- GeoTIFF formatter (COG, HGA) - chaining. (Alternative is enabling existing on-prem Gridding to GeoTIFF functionality).
- Reprojection special case? (HGA?, chaining)
- L1/L2 - Trajectory Subsetter + general chaining for formats (ICESat-2, SMAP, other?)
- Projection - use Swath-Projector (minor updates likely, 1D inputs)?
- Potential use cases for Oblique Equal-Area, Oblique Mercator, or Oblique EquiDistance projections
- Note SMAP is spiraling trajectory and is already "gridded" with CEA row/col data
- Not an existing on-prem feature
- Reformat: Vector Data: Shapefile, KML, NetCDF, multi-file ASCII
- as micro services, lifted mostly from on-prem implementations (or port/rewrite)- (Note size constraints, e.g., ICESat-2 files)
- NetCDF: CF, metadata upgrade from .h5 (on-prem implementation)
- ShapeFile, KML format output (.shp, .kml, .kmz)
- ASCII format output? - rewrite to Python, chaining - single file per variable
- CSV
- Choice of delimiters? (Comma, tab)
- “Row of vars” per time-stamp? (see Tabular ASCII)
- “Time, Name, Value” format
or “Time, Value” per variable, separate files (variable name as filename?) - GDAL standard?
- “Tabular-ASCII” - lifted from on-prem, or rewritten to Python
- Textual layout
- “Row of vars” per time-stamp
- Blocked per group and/or frequency (count)
- OPeNDAP / GDAL (HGA): various ASCII format choices ?
- L3/L4 - HOSS + chaining (ICESat-2, SMAP other?) √ (some reconfig, perhaps tweaks to code?)
(Note - will be .nc4 output, not original source .h5 format, but this hopefully is not too significant)- ICESat-2 has some notorious quirkiness - Lower-Left orientation, Edge-Aligned dimension scales
- Possible support for missing dimension-scales - use CF-ish config entry for grid-parameters (SMAP)
- see: GeoTagging for a Geo-Located Grid Array
- Reprojection - TBD (I suspect not HGA, rather PyResample based, Data Reprojection Wizard)
- Reformat - GeoTIFF/COG (HGA?), other standard Cloud formats, Zarr
- HDF-EOS-4 legacy data - multiple divergent data products
- HDF4 - NetCDF-4 pre-processor.
- OPeNDAP functionality?
- LADDS GeoLoco/Band-Subsetter
Some preliminary thoughts:
- Separate support for Projection (Swath-Projection-Gridding) ?
- Vs. Regridding (same projection)
- Vs. Reprojection (Equal-Area to Conformal, Cartesian to Polar, others)
- Feature expansion in Cloud Services to match on-prem
- Less any no-longer-relevant featues (H5-EOS)
- “Main Services”: HOSS for L3/L4, Trajectory Subsetter for L1/L2
- For Variable and Spatial/Temporal Subsetting
- BBox and Shape/Polygon Region
- HDF-5, NetCDF-4
- + Formats, + Projection/Reprojection
- Alternative for Legacy (HDF4, HDF-EOS-2)?
- Back-Port to on-prem - Harmony-on-prem within SDPS ?
- To potentially retire legacy code base for data not yet migrated
- Open port on EIL for a localized Harmony access?
- EDSC/CMR reconfiguration
SMAP L1/L2 -
- Trajectory Subsetter √ (done)
- GeoTIFF special case - minor adapter enablement
- + some coding refinement to enable GeoTIFF outputs
- -or- Refactor as separate micro service (SMAP-to-GeoTIFF)?
- -or- Gridding as an option within HDF-5 format (SMAP-to-Grid)
- We have this functionality in our archives if not immediately accessible.
- + GeoTIFF formatter (HGA) √ (preferred, I think)
- Reprojection (chaining): GeoTIFF/COG => HGA, other?
.he5 , NetCDF format output (.nc4) ? - micro services - lifted from on-prem is possible
- from GeoTIFF, w/ aggregation
- -or- from HDF-5 w/ Grid & Trajectory handling?
- again, available from on-prem if useful.
- ShapeFile format output (.shp) - as a micro service,
- lifted from on-prem is an option
- ASCII format output? - rewrite to Python, chaining
- CSV
- Choice of delimiters? (Comma, tab)
- “Row of vars” per time-stamp? (see Tabular ASCII)
- “Time, Name, Value” format
or “Time, Value” per variable? - GDAL standard?
- “Tabular-ASCII” - lifted from on-prem, or rewritten
- Textual layout
- “Row of vars” per time-stamp
- Blocked per group and/or frequency (count)
- OPeNDAP / GDAL (HGA) various ASCII format choices ?
Re. ICESat-2 L1/L2 - multiple divergent data products
L1/L2 (Trajectory) -
- Trajectory Subsetter √ (done)
- No Gridding, Reprojection nor NetCDF format requirements
- ShapeFile (.shp) and ASCII formats (see above), but with size constraints
SMAP L3/L4 - including 3D / Multi-Dimension (?)
- HOSS Updates for projected-grid data. √ (done)
- HOSS - with configuration file entries, minor updates?
- SMAP is notoriously lacking in CF content
- NetCDF format output (.nc4)
- default for HOSS (common .h5 to .nc4 “fixer”/Formatter - built-in to OPeNDAP!)
- An issue if HDF-5 is desired - close compliancy with source data, possibly just a file extension change?
- GeoTIFF / COG format - HGA chaining? (I think so)
- Regridding/Reprojection feature (TBD)
- ASCII format? (which ASCII)
- OPeNDAP has an option, I believe
- Other choices, HGA, chaining?
ICESat-2 - ATLAS - L3/L4 (Gridded, mix of projections, some subtle grid issues) -
- HOSS - with configuration file entries, minor updates?
- ICESat-2 has some notorious quirkiness - Lower-Left orientation, Edge-Aligned dimension scales
- NetCDF - default for HOSS (common .h5 to .nc4 “fixer”/Formatter - built-in to OPeNDAP!)
- To GeoTIFF / COG - HGA chaining?
- Regridding/Reprojection feature (TBD)
HDF-EOS-4 legacy data - multiple slightly divergent data products
- HOSS - with configuration file entries, minor updates?
- Assuming OPeNDAP updates - not sure how feasible
- HDF4 - NetCDF pre-processor.
- Note .nc4 is default output for HOSS (common .h5 to .nc4 “fixer”/Formatter - built-in to OPeNDAP!)
so processed-outputs compliancy with source data may be an issue. - ?? HEG rewrite ?? (Unlikely, except possibly - to capture unique reprojection/interpolation handling)
- Regridding/Reprojection feature (TBD)
- GeoTIFF - HGA