You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

Capturing ideas and snippets of information about cloud analytics.


Building blocks

(Open to suggestions about better categories or names of categories!)

Data Access

  • WCS 2.0 - multi-dimensional coverage data access over the Internet
  • OPenDAP - discipline-neutral means of requesting and providing data across the World Wide Web

Data Processing Services

  • WPS 2.0 - rules for standardizing  inputs and outputs for geospatial processing services

  • WCPS 1.0 - protocol-independent language for the extraction, processing, and analysis of multi-dimensional coverages representing sensor, image, or statistics data.

Data Models & Formats

  • Common Data Model - Unidata's abstract data model for scientific datasets, merges netCDF, HDF5, and OPeNDAP data models
  • Cloud Optimized GeoTIFF - GeoTIFF with internal organization that enables more efficient workflows on the cloud via HTTP GET range requests
  • EO JSON - a number of efforts to develop JSON specs for coverage data

Data Libraries

  • xarray - toolkit for analytics on multi-dimensional arrays for pandas

Visualization & Interaction

  • PyTables - built on top of the HDF5 library, tool for interactively browsing, processing and searching very large amounts of data

  • Jupyter Notebooks - Interactive code execution and visualization

Metadata & Catalogs

  • SpatioTemporal Asset Catalog - expose Earth observation data as spatiotemporal asset catalogs (possible on-the-fly catalog for cloud pipelines?)

Interoperability Tools

  • OpenAPI initiative - standardizing how to describe REST APIs (based on swagger)

Other NASA work

  • Cumulus - Cloud-based data ingest, archive, distribution and management system for EOSDIS

Tutorials & Articles

Tutorials

Articles

  • Fostering Cross-Disciplinary Earth Science Through Datacube Analytics, Baumann et al, 2018 - Abstract, Chapter PDF

Questions

  • How does one do interprocess communication in the cloud? In the old days there was Shared Memory, Pipes, Sockets, and Files. In the cloud it seems there's primarily HTTP (and maybe sockets) or files.
  • How do you decide when it's better to write out a file so you can stop running a process that's costing per minute vs. holding things in memory so you don't incur storage charges?
  • No labels