Session 3 - Open Science and Cloud Computing
Openscapes - Julia's presentation:
- (Q:) How can UWG/University collaborators use Openscapes resources
- earthaccess can open cloud or on-premise resources and enables easier access to data from different DAACs by reducing the amount of code needed
- training sessions can include live coding
- this resource aims to be accessible to new users
- Earthdata Cloud Cookbook is a comprehensive resource for cloud computing. It has tutorials, how-to's, instructional videos, access to workshop materials
- Community can contribute to Openscapes via GitHub or Quatro and material on the Openscapes community pages can be edited using the 'Edit this page' link on the right
- Mentors meet twice a month; community can choose 'Join as Mentor' to become an Openscapes mentor
- (Q:) What is/How can researchers determine the range of cost that should be added to proposals for intended cloud implementation and use? (A:) Researchers should run small tests to ensure efficient set-up for intended operations, then run larger test for confirmation. UWG questions remained based on this response, and there was acknowledgement that NASA needs to do more to support cost-estimates and accessibility of cloud compute resources for under-resourced scientists
- AWS has various storage options for building the most efficient system for computing needs
- Google Colabs was suggested as a way to disseminate information to early learners and students
- (Q:) Can Openscapes also help scientists with workflows outside of the cloud? (A:) Yes! Go to About > Resources on the Openscapes website for scenarios or to request that a workflow for a specific application be developed
- (Q:) How will ocssw work in the cloud? Can the user community contribute to developing this?
Cloud Computing - Ian's presentation:
- NASA data are located in AWS Region us-west-2. so users interested in cost savings should plan to create resources to work with this data in the same region.
- 3 cloud computing paradigms were discussed
- benefit of using Earthdata Cloud resources - Harmony does processing for you, transformation is supported (netCDF to cloud-optimized zarr, for example)community-supported hubs should be considered as a place for users to cloud compute together
- all should investigate and share any free cloud-computing resources for consideration as longer-term solutions
- DAAC tutorials/notebooks are primarily done in Python. UWG did not feel that was an issue since Python is widely-used, although support of Matlab would increase accessibility of tools to the entire user community
- A question was raised as to whether notebooks can be done using Matlab. Ian replied that you can set up a MatLab instance from JupyterHub (link to Mathworks resource on Openscapes site: https://nasa-openscapes.github.io/earthdata-cloud-cookbook/tutorials/matlab.html), however Matlab is run using JupyterHub so not as efficient as using Python
- It was shared that a tutorial on ocssw will be shared on Ocean Color site soon.
- (Q:) How do research communities identify the best way to use cloud environment? Who gets billed? Where should accounts be set up? What group can provide a platform and funding for cloud computing environments for researchers?
SeaDAS - Aynur's presentation:
- SeaDAS is exploring integration fo SeaDAS into NASA’s cloud compute environment and what features can/should be offered in the cloud for processing
- PRISM support in ocssw is coming. (Q:) Will AVIRIS be included? David Thompson will work with SeaDAS team on establishing that.
- point of clarification: ocssw is native to Linux, but SeaDAS works on all platforms - Linux, Window, Mac
- It might be useful to find out usage statistics for SeaDAS software to inform which features are most useful to the community, what features to include in upcoming versions vs which features could be sun-setted and to increase usage of existing tools across the user community.
- (Q:) Does SeaDAS plan to add tools for clustering data, data mining, data fusion, NRT data analysis?
- (Action) tutorial requested for 1) running L2gen from command line 2) toggle flags from command line 3) customization of satellite data processing using ocssw; OB.DAAC requested specific examples from UWG to help craft these tutorials.
- UWG asked SeaDAS team what challenges they encountered moving from developing for a sensor like VIIRS to OCI which has more channels (A:) performance issue - slower to process OCI
- en larger requiring more local machine capacity to download and install the software. Size information is available on the SeaDAS website under Downloads > SeaDAS Software Package
- discussion about SeaDAS usage led to compare/contrast with other popular tools like Panoply. While UWG agrees that SeaDAS is much more powerful than a tool like Panoply, they expressed a learning curve for visualization since it just accepts lat/lon. It was suggested that tools like arcGIS may make for a better comparison with SeaDAS in terms of capability/ease of use
- ocssw and SeaDAS aren't frequently used together
- in-situ data can be matched up in SeaDAS, but the data must have a header in order for SeaDAS to be able to use it
- UWG question - Are there too many tools where there could be more collaboration on one tool? Is Openscapes the best place to converge on identifying/developing one tool?
- computational efficiency will be important factor when deciding what to enable SeaDAS to do in the cloud and will encourage user adoption of a cloud instance.
- (Q:) Can the user community/UWG members contribute to defining ocssw in the cloud?
- Feature Requests for SeaDAS:
- allow 'save as' to let user save the file as a different format. This is currently solved by using 'export'
- 'Import code'/'code builder' button was found to be useful in a previous version. Will this come back?
- plotting spectral view using l2 data requires headers that currently have to be manually added in order to import and view in other tools. This becomes a time-consuming effort since it has to be done for multiple files. Can SeaDAS include headers automatically?
- can spectral view be averaged?
- Issues:
- Users have trouble with Docker for Windows
- SeaDAS requires an internet connection because it automatically looks for ancillary data even if you already have it locally. This creates a lot of queries from the same IP which ultimately gets the IP blocked by NASA. Can ancillary auto-find be disabled by user?
ARSET - Amita's presentation:
- trainings are designed around themes, climate is the latest addition
- trainings are free, and many build on prior trainings
- (Q:) Do ARSET and Openscapes collaborate with one another?
- trainings have attracted a wide range of participants from international locations
- designed to meet the audiences' knowledge level from 'beginner (fundamental)' to 'advanced'
- DAAC's overpass predictor supports citizen science
- training formats are lectures, workshops, Q&A etc and are based on user needs
- (Recommendation) OB.DAAC may want to schedule an ARSET training for pulling data from Earthdata Search rather than the Level 1 and 2 Browser.
- trainings are decided 1-year in advance. July/August is the request window for trainings and must be received at least 6 months out from date
- (Q:) Is there collaboration with ARSET and IOCCG?
- ARSET confirmed that when data changes, they provide updated information by conducting new trainings
- UWG asked if there is a specific user/subset of Ocean Color community users to better support?
UWG discussion - Day 2:
- Longer conversations on certain topics requested
- Is there a way to see what other DAACs are talking about at their UWGs? Yes!
- (Q:) Is there UWG interaction across DAACs? (A:) ESDIS is working on developing a combined working group that would have at least a representative from each UWG present to discuss commonalities of needs, issues and solutions. ESWG will be considered as an opportunity for this meeting.
- Cloud usage stats are available in EO Browser (ESA)-useful for consideration and adoption with OB.DAAC tools so users can improve understanding and planning for computational costs
- Copernicus has a cloud computing system they put a lot of work into to ensure the lowest cost to user. This is an idea for how Earthdata's cloud computing system could consider lessening cost burden on users.
- Concerns about cloud computing at Universities was revisited in order for ESDIS to give their comments and reply to questions.
- Downloading NASA data from the cloud will remain free to users, computing would be the user’s responsibility
- UWG expressed concern about NASA HQ trending toward privatization of resources which can negatively affect science outcomes.
- UWG further discussed NASA actions to address accessibility of data and analysis; specifically, the recent focus/increase in funding for R2 institutions was acknowledged as significant and positive progress; however, privatization of resources is a ‘landscape wide’ problem for continuing to improve accessibility across government and academic institutions, and NASA needs to consider broader trends to ensure resources are appropriately allocated for these efforts
- UWG suggested collaboration between NASA, NSF, and NIH to create programs to improve coordination between government and academic institutions to confront inequities and avoid ‘cost traps’ that are common when switching to private services for research
- ESDIS will bring issues related to privatization up to HQ
- Openscapes offers AWS credits to do small computes for about 1 or 2 years.
Add Comment