You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

Vision for Legacy HDF (.he4), SMAP and ICESat-2 migration to the Cloud

Possible Feature Roadmap - Highly subjective, Order/Content subject to change.

  • General enable/configure/test Main service per Datasets
    • Trajectory Subsetter - mostly pre-configured for SMAP & ICESat-2
    • HOSS - Minor updates for SMAP, ICESat-2 special cases (e.g., TRT-290 - HOSS support for SMAP L3)
      • Missing CF (configuration files already accommodate, mostly)
      • Lower-Left Orientation
      • Edge aligned dimension scales
    • SMAP to GeoTIFF - Special case in Trajectory Subsetter for SMAP data, enabling existing code for In-Cloud deployment
  • Harmony-GDAL-Formatter - GeoTIFF/COG - ICESat-2 & SMAP (L3/L4 Formats)
    (Note e.g., HOSS (OPeNDAP) produces NetCDF-4 output, even for HDF-5 inputs, but .nc4 is built upon .h5.  Likely .nc4 is default and preference over .h5 in all cases and .h5 is not available).
    • Refactor HGA - split archive, 2 or 3 separate services - HGF, HGA, HGAgg, (TRT-213 - Images for GITC?)
    • COG formatter - GeoTIFF source (1st), NetCDF source (2nd)
    • See HGA Roadmap
  • L1/L2 Formats (mostly custom - port on-prem code to in-cloud):
    • Shapefile (Vector, Custom code exists, GDAL?)
    • NetCDF (Custom upgrade .h5 to .nc4 with CF support, port to in-cloud)
    • ASCII (CSV, Tabular, other? - Custom code exists, GDAL?)
    • kml, kmz (GDAL + custom front-end?)
  • L3/L4 Reprojection - PyResample based?
  • HGA-Reprojection? (SMAP focus, currently GDAL based, but not clear if GDAL specific option is best path)
  • Swath-Projector for Trajectory Data (e.g. ICESat-2) ?
    • Minor fix for 1D vs 2D inputs
    • Not specifically an on-prem feature, but from use-case analysis - an opportunity
  • Legacy HDF-4 file handling (HDF-4 to NetCDF, OPeNDAP?)

Feature Summary:

  • L1/L2 - Trajectory Subsetter - SMAP special case - Gridded outputs
    • Enable in existing code base for Cloud deployment
    • Gridded outputs to .h5 (Cropped grid, Revive un-merged branch?  Not-GeoTIFF as on-prem)
    • GeoTIFF formatter (COG, HGA) - chaining.  (Alternative is enabling existing on-prem Gridding to GeoTIFF functionality).
    • Reprojection special case? (HGA?, chaining)

  • L1/L2 - Trajectory Subsetter + general chaining for formats (ICESat-2, SMAP, other?)
    • Projection - use Swath-Projector (minor updates likely, 1D inputs)?
      • Potential use cases for Oblique Equal-Area, Oblique Mercator, or Oblique EquiDistance projections
      • Note SMAP is spiraling trajectory and is already "gridded" with CEA row/col data
      • Not an existing on-prem feature
    • Reformat: Vector Data: Shapefile, KML, NetCDF, multi-file ASCII
      - as micro services, lifted mostly from on-prem implementations (or port/rewrite)
      • (Note size constraints, e.g., ICESat-2 files)
      • NetCDF: CF, metadata upgrade from .h5 (on-prem implementation)
      • ShapeFile, KML format output (.shp, .kml, .kmz) 
      • ASCII format output? - rewrite to Python, chaining - single file per variable
        • CSV
          • Choice of delimiters? (Comma, tab)
          • “Row of vars” per time-stamp? (see Tabular ASCII)
          • “Time, Name, Value” format
            or “Time, Value” per variable, separate files (variable name as filename?)
          • GDAL standard?
        • “Tabular-ASCII” - lifted from on-prem, or rewritten to Python
          • Textual layout
          • “Row of vars” per time-stamp
          • Blocked per group and/or frequency (count)
        • OPeNDAP / GDAL (HGA): various ASCII format choices ?

  • L3/L4 - HOSS + chaining (ICESat-2, SMAP other?) √ (some reconfig, perhaps tweaks to code?)
    (Note will be .nc4 output not originally scoped .h5, but hopefully not significant)
      • ICESat-2 has some notorious quirkiness - Lower-Left orientation, Edge-Aligned dimension scales
      • Possible support for missing dimension-scales - use CF-ish config entry for grid-parameters (SMAP)
        - see: GeoTagging for a Geo-Located Grid Array
    • Reprojection - TBD (I suspect not HGA, rather PyResample based, Data Reprojection Wizard)
    • Reformat - GeoTIFF/COG (HGA?), other standard Cloud formats, Zarr
  • HDF-EOS-4 legacy data - multiple divergent data products
    • HDF4 - NetCDF-4 pre-processor.
    • OPeNDAP functionality?

Some preliminary thoughts:

  • Separate support for Projection (Swath-Projection-Gridding) ?
    • Vs. Regridding (same projection)
    • Vs. Reprojection (Equal-Area to Conformal, Cartesian to Polar, others)
  • Feature expansion in Cloud Services to match on-prem
    • Less any no-longer-relevant featues (H5-EOS)
    • “Main Services”: HOSS for L3/L4, Trajectory Subsetter for L1/L2
      • For Variable and Spatial/Temporal Subsetting 
      • BBox and Shape/Polygon Region
      • HDF-5, NetCDF-4
    • + Formats, + Projection/Reprojection
    • Alternative for Legacy (HDF4, HDF-EOS-2)?
      • or updates to OPeNDAP ?
  • Back-Port to on-prem - Harmony-on-prem within SDPS ?
    • To potentially retire legacy code base for data not yet migrated
    • Open port on EIL for a localized Harmony access?
    • EDSC/CMR reconfiguration


SMAP L1/L2

  • Trajectory Subsetter √ (done)
  • GeoTIFF special case - minor adapter enablement
    • + some coding refinement to enable GeoTIFF outputs
    • -or- Refactor as separate micro service (SMAP-to-GeoTIFF)?
    • -or- Gridding as an option within HDF-5 format (SMAP-to-Grid)
      • We have this functionality in our archives if not immediately accessible.
      • + GeoTIFF formatter (HGA) √ (preferred, I think)
  • Reprojection (chaining): GeoTIFF/COG => HGA, other?
  • .he5   NetCDF format output (.nc4) ? 
    • micro services - lifted from on-prem is possible
    • from GeoTIFF, w/ aggregation 
    • -or- from HDF-5 w/ Grid & Trajectory handling?
      • again, available from on-prem if useful.
  • ShapeFile format output (.shp) - as a micro service, 
    • lifted from on-prem is an option
  • ASCII format output? - rewrite to Python, chaining
    • CSV
      • Choice of delimiters? (Comma, tab)
      • “Row of vars” per time-stamp? (see Tabular ASCII)
      • “Time, Name, Value” format
        or “Time, Value” per variable?
      • GDAL standard?
    • “Tabular-ASCII” - lifted from on-prem, or rewritten
      • Textual layout
      • “Row of vars” per time-stamp
      • Blocked per group and/or frequency (count)
    • OPeNDAP / GDAL (HGA) various ASCII format choices ?


Re. ICESat-2 L1/L2 - multiple divergent data products

L1/L2 (Trajectory) -

  • Trajectory Subsetter √ (done)
  • No Gridding, Reprojection nor NetCDF format requirements
  • ShapeFile (.shp) and ASCII formats (see above), but with size constraints


SMAP L3/L4 - including 3D / Multi-Dimension (?)

  • HOSS Updates for projected-grid data. √ (done)
  • HOSS - with configuration file entries, minor updates?
    • SMAP is notoriously lacking in CF content
  • NetCDF format output (.nc4)
    • default for HOSS (common .h5 to .nc4 “fixer”/Formatter - built-in to OPeNDAP!)
    • An issue if HDF-5 is desired - close compliancy with source data, possibly just a file extension change?
  • GeoTIFF / COG format - HGA chaining? (I think so)
  • Regridding/Reprojection feature (TBD)
  • ASCII format? (which ASCII)
    • OPeNDAP has an option, I believe
    • Other choices, HGA, chaining?


ICESat-2 - ATLAS - L3/L4 (Gridded, mix of projections, some subtle grid issues) -

  • HOSS - with configuration file entries, minor updates?
    • ICESat-2 has some notorious quirkiness - Lower-Left orientation, Edge-Aligned dimension scales
  • NetCDF - default for HOSS (common .h5 to .nc4 “fixer”/Formatter - built-in to OPeNDAP!)
  • To GeoTIFF / COG - HGA chaining?
  • Regridding/Reprojection feature (TBD)


HDF-EOS-4 legacy data - multiple slightly divergent data products

  • HOSS - with configuration file entries, minor updates?
    • Assuming OPeNDAP updates - not sure how feasible
  • HDF4 - NetCDF pre-processor.
  • Note .nc4 is default output for HOSS (common .h5 to .nc4 “fixer”/Formatter - built-in to OPeNDAP!)
    so processed-outputs compliancy with source data may be an issue.
  • ?? HEG rewrite ?? (Unlikely, except possibly - to capture unique reprojection/interpolation handling)
  • Regridding/Reprojection feature (TBD)
  • GeoTIFF - HGA


  • No labels