Vision for Legacy HDF (.he4), SMAP and ICESat-2 migration to the Cloud
Possible Feature Roadmap - Highly subjective, Order/Content subject to change.
- General enable/configure/test Main service per Datasets
- Trajectory Subsetter - mostly pre-configured for SMAP & ICESat-2
- HOSS - Special cases for SMAP, ICESat-2 (TRT-290)
- HGF-COG - SMAP (L3/L4 Formats)
- Refactor HGA - split archive, 2 or 3 separate services - HGF, HGA, HGAgg, (TRT-213?)
- COG formatter - GeoTIFF source (1), NetCDF source (2)
- HGA-Reprojection (SMAP focus, currently GDAL based)
- Swath-Projector for Trajectory Data
- 1D vs 2D inputs
- SMAP special cases (spiraling trajectory, but already gridded, and ICESat-2 ?)
- L1/L2 Formats (mostly custom):
- Shapefile (Vector, Custom, GDAL?)
- NetCDF
- ASCII (CSV, Tabular, other?)
- kml, kmz (GDAL)
- L3/L4 Reprojection
- Legacy HDF-4 file handling (HDF-4 to NetCDF, OPeNDAP?)
Feature Summary:
- L1/L2 - Trajectory Subsetter - SMAP special case - Gridded outputs
- Enable in existing code base for Cloud deployment
- Gridded outputs to .h5 (Cropped grid, Revive un-merged branch? Not-GeoTIFF as is current)
- GeoTIFF formatter (COG, HGA) - chaining
- Reprojection special case? (HGA?, chaining)
- L1/L2 - Trajectory Subsetter + general chaining for formats (ICESat-2, SMAP, other?) √
- Projection - use Swath-Projector (minor updates likely, 1D inputs)
- (Potential use cases for Oblique Equal-Area, Oblique Mercator, or Oblique EquiDistance projections)
- Note SMAP is spiraling trajectory and is already "gridded" with CEA row/col data
- (Not an existing on-prem feature)
- Reformat: Vector Data: Shapefile, KML, NetCDF, multi-file: ASCII
- as micro services, lifted mostly from on-prem implementations (or port/rewrite)- (Note size constraints, e.g., ICESat-2 files)
- NetCDF: CF, metadata upgrade from .h5
- ShapeFile, KML format output (.shp, .kml, .kmz)
- ASCII format output? - rewrite to Python, chaining - single file per variable
- CSV
- Choice of delimiters? (Comma, tab)
- “Row of vars” per time-stamp? (see Tabular ASCII)
- “Time, Name, Value” format
or “Time, Value” per variable (name)? - GDAL standard?
- “Tabular-ASCII” - lifted from on-prem, or rewritten to Python
- Textual layout
- “Row of vars” per time-stamp
- Blocked per group and/or frequency (count)
- OPeNDAP / GDAL (HGA): various ASCII format choices ?
- L3/L4 - HOSS + chaining (ICESat-2, SMAP other?) √ (some reconfig, perhaps tweaks to code?)
(Note will be .nc4 output not originally scoped .h5, but hopefully not significant)- ICESat-2 has some notorious quirkiness - Lower-Left orientation, Edge-Aligned dimension scales
- Possible support for missing dimension-scales - use CF-ish config entry for grid-parameters (SMAP)
- see: GeoTagging for a Geo-Located Grid Array
- Reprojection - TBD (I suspect not HGA, rather PyResample based, Data Reprojection Wizard)
- Reformat - GeoTIFF/COG (HGA?), other standard Cloud formats, Zarr
- HDF-EOS-4 legacy data - multiple divergent data products
- HDF4 - NetCDF-4 pre-processor.
- OPeNDAP functionality?
Some preliminary thoughts:
- Separate support for Projection (Swath-Projection-Gridding) ?
- Vs. Regridding (same projection)
- Vs. Reprojection (Equal-Area to Conformal, Cartesian to Polar, others)
- Feature expansion in Cloud Services to match on-prem
- Less any no-longer-relevant featues (H5-EOS)
- “Main Services”: HOSS for L3/L4, Trajectory Subsetter for L1/L2
- For Variable and Spatial/Temporal Subsetting
- BBox and Shape/Polygon Region
- HDF-5, NetCDF-4
- + Formats, + Projection/Reprojection
- Alternative for Legacy (HDF4, HDF-EOS-2)?
- Back-Port to on-prem - Harmony-on-prem within SDPS ?
- To potentially retire legacy code base for data not yet migrated
- Open port on EIL for a localized Harmony access?
- EDSC/CMR reconfiguration
...