This draft is based on a recommendation of the OASIS E-Government Technical Committee, and aligns with findings of a white paper by the Industry Advisory Council [IAC], Enterprise Architecture Shared Interest Group, titled "Interoperability Strategy: Concepts, Challenges, and Recommendations".
We are currently modifying the protocol to operate in a web services environment. We refer to this as DAP4. We anticipate the first implementation of DAP4 will be available this summer.
As noted above the OPeNDAP data access protocol addresses format dependence. We are now working on the layers above this, layers associated with the semantics of the data for oceanographic applications and the restructuring of what we refer to as sequence data, generally in situ data.
One area where a standard could enable something that is difficult today is the area of order fulfillment notifications. Data orders imply an asynchronous process, such as the retrieval of data from tape and staging to disk, so that the typical scenario is:
The means of notification in Step 4 has no standard: notifications are currently via either email or FTP-push. The formats of the notifications are unique to each data or service provider. This makes it difficult to construct clients (esp. in the field of applications) with fully automated machine-to-machine interfaces: that is, an interface that could request data (or services) and when the data are ready, automatically retrieve it.
From our experience working with Earth Science data users and data systems, we have found that:
ESML was developed in response to this situation. Based on XML, ESML was designed as an elegant solution to address the research issue of data format heterogeneity, and to specifically target the special characteristics of science data and spatial imagery. ESML is unique in that it is not another new data format; instead it is an external syntactic metadata based solution for decoding existing formats. ESML provides a language for representing scientific data formats and structures, and its associated software library enables integration of disparate and often distributed data.
The standard formats would constitute a distribution standard, not a processing or archive standard. This implies a flexible generation capability that is aligned with a "backend" distribution function where data is placed in some requested standard form for delivery to the user.
In conjunction with this would be standard access capabilities supplied through generally available I/O libraries to ease software development efforts.
The other problem is that as compared to gridded data, where it is reaching a point where there are only a handful of formats being used, there is almost no standardization for observational data.
Secondly, the issue of standardizing what is put into a data format. For example, the fact that you have dimension or units in NetCDF doesn't tell you what they are, and for a program to use these things effectively, these elements must also be standardized. For our netcdf gridded files we pretty much use the COARDS convention, though many people use the expanded CF convention. Having these, or something similar, as part of the standard we also would consider essential. Also, to beat a dead horse, there is no similar set of standards for observational data. The best I have seen so far are those for the ARGO program, using NetCDF files - but the contents, units etc etc are very precisely laid out.
A spatial subsetting capability is missing in the EOSDIS, which is particularly important for users of land remote sensing data. Users should have an ability to request and download image sets by geographic extent rather than be arbitrary spatial units.
The map projection grids currently in use provide for two dimensional representations of the Earth surface. None of these projections is suitable for a global unified interaction with a digital Earth model. For that purpose, a uniform, continuous, conjugate, and global digital expression of the Earth sphere needs to be established.
Currently there are individual approaches being developed that include methodologies very similar to what is required, but such efforts need to be directed in a manner that results in standards that are adopted community-wide and industry-wide. These approaches use terms like "Octahedral Quaternary Triangular Mesh" and "Geodesic Grid" to describe methods for spherical surface tessellation based on polyhedrons.
Continue the current EOS mission concept of a standard set of attitude and ephemeris data measurements and file structures. Also continue providing standard access and transformation libraries for these data sets, as is currently available in the EOS Toolkit.
We work with communities that build digital libraries for publishing data, data grids for sharing data, and persistent archives for preserving data. We use generic data management infrastructure (Storage Resource Broker) to implement all three environments. Based on interactions with these communities, the standard approach is to assemble a collection of data that is going to be shared. The collection can span multiple sites.
To manage data distributed across multiple sites, we had to implement virtualization mechanisms (you need at least 7). They first four are
The result is the ability to organize distributed data into a collection, share data between sites and users under access controls, discover data based on metadata attributes, track all operations done on data.
The closest to your scenario is the sharing of data between data grids. This requires controls on sharing of resources (may I write on your storage), user names (who authenticates your name), files (which files are shared for whom), and metadata (who updates metadata to track operations on the files). Interchanges between data grids requires specifying all of these constraints.
white papers
of ~1-2 pp from each volunteer contributor. Fred Huemmrich might have more information as he led a one of the responses.The CEOS WGISS group is actively attempting to develop a test system where data from different sources are projected into a common format, etc., upon user request. Any/all reformatting would occur transparently to the user (i.e.., behind the scenes). POC is: "Tim B Smith" . This project is in cooperation with CEOS WGCV, which is currently headed by a GSFC person (Steve Ungar). Jeff Morisette is the local NASA POC for the test system.