ESO Cloud Analytics Notepad

Capturing ideas and snippets of information about cloud analytics.

Building blocks

(as this list grows, we should split it into specs, libraries, etc.)

xarray - toolkit for analytics on multi-dimensional arrays for pandas
Common Data Model - Unidata's abstract data model for scientific datasets, merges netCDF, HDF5, and OPeNDAP data models
Cloud Optimized GeoTIFF - GeoTIFF with internal organization that enables more efficient workflows on the cloud via HTTP GET range requests
SpatioTemporal Asset Catalog - expose Earth observation data as spatiotemporal asset catalogs (possible on-the-fly catalog for cloud pipelines?)
OpenAPI initiative - standardizing how to describe REST APIs (based on swagger)
WCS 2.0 - multi-dimensional coverage data access over the Internet
WPS 2.0 - rules for standardizing inputs and outputs for geospatial processing services
OPenDAP - discipline-neutral means of requesting and providing data across the World Wide Web
EO JSON - a number of efforts to develop JSON specs for coverage data
PyTables - built on top of the HDF5 library, tool for interactively browsing, processing and searching very large amounts of data
Jupyter Notebooks - Interactive code execution and visualization

Cumulus - Cloud-based data ingest, archive, distribution and management system for EOSDIS

How does one do interprocess communication in the cloud? In the old days there was Shared Memory, Pipes, Sockets, and Files. In the cloud it seems there's primarily HTTP (and maybe sockets) or files.
How do you decide when it's better to write out a file so you can stop running a process that's costing per minute vs. holding things in memory so you don't incur storage charges?