You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Capturing ideas and snippets of information about cloud analytics.


Building blocks

(as this list grows, we should split it into specs, libraries, etc.)

  • xarray - toolkit for analytics on multi-dimensional arrays for pandas
  • Common Data Model - Unidata's abstract data model for scientific datasets, merges netCDF, HDF5, and OPeNDAP data models
  • Cloud Optimized GeoTIFF - GeoTIFF with internal organization that enables more efficient workflows on the cloud via HTTP GET range requests
  • SpatioTemporal Asset Catalog - expose Earth observation data as spatiotemporal asset catalogs (possible on-the-fly catalog for cloud pipelines?)
  • OpenAPI initiative - standardizing how to describe REST APIs (based on swagger)
  • WCS 2.0 - multi-dimensional coverage data access over the Internet
  • WPS 2.0 - rules for standardizing  inputs and outputs for geospatial processing services
  • OPenDAP - discipline-neutral means of requesting and providing data across the World Wide Web
  • EO JSON - a number of efforts to develop JSON specs for coverage data
  • PyTables - built on top of the HDF5 library, tool for interactively browsing, processing and searching very large amounts of data
  • Jupyter Notebooks - Interactive code execution and visualization

  • Cumulus - Cloud-based data ingest, archive, distribution and management system for EOSDIS


Questions

  • How does one do interprocess communication in the cloud? In the old days there was Shared Memory, Pipes, Sockets, and Files. In the cloud it seems there's primarily HTTP (and maybe sockets) or files.
  • How do you decide when it's better to write out a file so you can stop running a process that's costing per minute vs. holding things in memory so you don't incur storage charges?
  • No labels