The geoscience community is undergoing a pivotal transformation from unevenly distributed local computing resources to on-demand and cloud-based computing infrastructure. This transition is necessary to meet the growing computational needs of scientific research, educate the next generation of geoscientists, and tackle critical environmental challenges.

Cloud computing refers to the delivery of computing resources, such as processing power, data storage, and software, via an Internet connection as needed, allowing users to access and utilize these resources without the need for local computing infrastructure providing those resources. Cloud computing is quickly becoming an essential tool for Earth and space scientists because of the increasing complexity and volume of scientific data and larger computing power needs. At the same time, “the cloud” is streamlining scientific workflows and enabling new kinds of analyses and collaborations across scientific disciplines.

Although the scientific benefits are clear, a number of challenges continue to slow adoption of cloud computing: training in new technologies, concerns about vendor lock-in, and the ongoing need to manage costs in a stable computing environment. A NASA-funded, cloud-based computational research environment established in 2022 called CryoCloud addresses these challenges using a transferable community framework to deliver greater scientific inclusivity, reproducibility, and collaboration.

CryoCloud’s Community Framework Catches On

CryoCloud minimizes technical barriers that can prevent researchers from transitioning to cloud computing by eliminating the need for cloud engineering expertise. This allows users to focus on what they do best: science.

The CryoCloud team partners with cloud engineers at the International Interactive Computing Collaboration (2i2c), a nonprofit, open-source cloud platform service provider for research and education. Together, they curate CryoCloud with tools and workflows designed for cryospheric researchers.

The CryoCloud community framework has established a stable, cost-effective, and long-lasting platform that promotes innovation, flexibility, and community empowerment (Figure 1). CryoCloud’s community focal point is JupyterHub, a cloud-based interactive computing environment. JupyterHubs offer a flexible and powerful user interface that integrates various tools and workflows, allowing users to write and execute code, visualize data, manage files, and collaborate in a single, unified workspace. CryoCloud minimizes technical barriers that can prevent researchers from transitioning to cloud computing by eliminating the need for cloud engineering expertise. This allows users to focus on what they do best: science.

Figure showing a circle intersected by colorful dots and shapes denoting human persons. Within the circle are two illustrations: one of glaciers and snow, with the word CryoCloud, and the other with the words Community of Practice above eight person shapes. Descriptions of the groups represented by the person shapes on the circle and the way they interact appear outside the circle.
Fig. 1. CryoCloud’s community framework model. OSS, open-source software. Credit: CryoCloud

CryoCloud lowers boundaries to entry for new users of cloud computing by flattening the learning curve and broadening accessibility. CryoCloud’s front end uses familiar computing interfaces, making CryoCloud look and feel like a local computing setup as it interfaces with popular interactive development environments such as JupyterLab, Visual Studio (VS) Code, and RStudio for coding in common coding languages. CryoCloud’s streamlined technology, training workshops, and ease of use increase access for a wider range of researchers, allowing them to reap the benefits of cloud computing and develop effective computational research practices while minimizing challenges.

Starting with an initial pilot with NASA’s Ice, Cloud, and land Elevation Satellite-2 (ICESat-2) science team in October 2022, the CryoCloud team has onboarded more than 500 scientists through in-person workshops, hackathon-style educational events, and open learning resources. CryoCloud has supported small hybrid community workshops such as the Future of Greenland ice Sheet Science (FOGSS) workshop, as well as conferences at larger scientific meetings like AGU’s annual meeting. Concurrently, the CryoCloud JupyterHub has served as the computing infrastructure for 12 hackathon-style educational and collaborative research workshops centered on NASA ICESat-2 altimetry data, QGreenland (a free mapping tool for interdisciplinary Greenland-focused research, teaching, decisionmaking, and collaboration), other satellite and airborne science missions, transdisciplinary science initiatives, and community software tools and skills. These educational hackathons teach participants data analysis, collaboration, and open science skills while offering unstructured time for coding, data exploration, and collaborative research ideation.

CryoCloud simplifies the use and adoption of advanced computing resources for students and scientists by providing a centralized JupyterHub environment, eliminating the need for participants to manage diverse local software installations and configurations. This allows event organizers and instructors to focus on hackathon content and learning objectives rather than technical setup. Additionally, instructors often contribute to CryoCloud’s growing knowledge base by documenting effective workflows that emerge during events [Fisher, 2023; Wong, 2024], which helps create more tools and templates for future educational events and user teams. Learning resources from these events are maintained and accessible through CryoCloud’s JupyterBook website.

Accessible, Flexible, Scalable, and Cost-Effective

With CryoCloud, researchers can rapidly search and stream these large, cloud-optimized data sets to better manage the increasing volume, velocity, and variety of big data that are now commonplace in geosciences research.

CryoCloud simplifies access to cloud-based data sets, which are integral to NASA’s Open-Source Science Initiative. Storing cloud-hosted data in cloud-optimized formats speeds up data read-ins by a factor of 10 to 100 and allows users to read in data subsets, which maximizes efficiencies in computer memory usage and reduces analysis times. With CryoCloud, researchers can rapidly search and stream these large, cloud-optimized data sets to better manage the increasing volume, velocity, and variety of big data that are now commonplace in geosciences research.

In addition to accelerating scientific workflows for cloud-native data processing, CryoCloud has also developed and documented in its JupyterBook reproducible workflows for accessing non-cloud-hosted and legacy data formats. These workflows simplify the time series and data fusion investigations required for quantifying the complex processes that are changing our planet [National Academies of Sciences, Engineering, and Medicine, 2018].

Many NASA data sets have not yet been transitioned to be cloud hosted or stored in cloud-optimized formats, so the CryoCloud community partners with data providers to improve existing and future data sets. CryoCloud advocates for the full adoption of findable, accessible, interoperable, and reusable (FAIR) data standards and serving data products in cloud-optimized formats. Meeting these standards maximizes user benefits, expedites cloud adoption, and makes geoscientific computing state-of-the-art.

CryoCloud is user ready with built-in and stable programming environments for Python, R, MATLAB, and desktop applications like QGIS. These environments eliminate the overhead associated with building one’s own software environment. For users with unique needs, the system provides flexibility, with the option to bring one’s own environment, making it possible for different kinds of researchers to use CryoCloud’s community framework.

Cloud-computing resources are typically available on a pay-as-you-go basis, allowing JupyterHubs like CryoCloud to scale their usage on the basis of current needs. This approach avoids the cost of unused hardware and enables the scaling of servers from the typical background usage of individual and teams of researchers to hosting 100+ workshop participants seamlessly for about $1 per person per day [Fisher, 2023]. Flexibility in the CryoCloud infrastructure also allows individual researchers and research teams with large and diverse computing needs (leveraging artificial intelligence and machine learning methods, for instance) to bring cloud-computing credits into CryoCloud to access more powerful systems, including GPUs (graphics processing units), which are specialized processors designed to perform complex computations.

Accelerating Scientific Discovery with Analysis Tools and Pipelines

CryoCloud has helped scientific teams debug and innovate tools, making those tools more useful to a wider swath of cryospheric researchers. It has supported a team from the University of Washington, for example, in building SlideRule Earth, a Python tool for on-demand data processing of various remote sensing data products [Shean et al., 2023]. National Snow and Ice Data Center and ICESat-2 software developers work hand in hand with the CryoCloud team and users to create new cloud data access workflows and identify ongoing user needs (such as earthaccess and icepyx libraries). Geographic information system (GIS) tool developers have adapted CryoCloud to make desktop-only tools accessible through a cloud-based “virtual desktop,” expanding their usability and availability [Fisher, 2023].

Making data processing pipelines reproducible and open allows future researchers to innovate more rapidly, accelerating scientific discovery.

The toolchain available in CryoCloud allows for the streamlining of analysis workflows such as a polar Landsat sea surface temperature processing algorithm [Snow, 2023]. This cloud workflow eliminates onerous searching, downloading, unzipping, and stacking bands of Landsat scenes, a process that now requires only a few lines of code. With fast data streaming and full automation using open-source Python packages in active development by other CryoCloud users, the algorithm reduces researcher time investments by weeks. It also eliminates the need for saving many data processing outputs, reducing data storage needs by 2 orders of magnitude. Making data processing pipelines reproducible and open allows future researchers to innovate more rapidly, accelerating scientific discovery.

An Open Science Blueprint for Democratizing Science

By facilitating sharing of data, code, tools, and scientific workflows across multi-institutional teams, CryoCloud is more than a computational tool; it is a nexus for scientific innovation, collaboration, and community building. By fostering this open science environment, CryoCloud is laying the groundwork for a more collaborative and transparent approach to scientific research, and it is helping to unify previously siloed groups into an integrated community focused on understanding Earth’s cold regions. CryoCloud’s cofounder and lead, Tasha Snow, was recognized with AGU’s 2023 Open Science Recognition Prize [Mines Staff, 2023] for building CryoCloud upon these open science principles.

The collaboration between CryoCloud and 2i2c has developed a versatile platform that transcends cryospheric research. Its adaptable architecture allows for standardized and reproducible research environments that other scientific fields can use as a blueprint for their transition to cloud-based research [Snow et al., 2023]. This adaptability is crucial in an era in which interdisciplinary research is becoming increasingly important for tackling complex scientific challenges. The CryoCloud community framework is durable, flexible, and cost-effective.

Cloud-based interactive computing platforms like CryoCloud provide scientists with access to computational resources scaled to their needs without requiring expensive local infrastructure that is unequally distributed. This design democratizes access to cost-efficient computational resources across a diverse swath of researchers, including early-career scientists, marginalized groups, and researchers at smaller institutions and in developing countries. After being established with funding from NASA’s ICESat-2 mission for its science team, CryoCloud has been freely accessible to the NASA cryospheric research community thanks to funding from NASA and NASA’s Transform to Open Science program. This democratization of science is a crucial step toward a more inclusive and equitable scientific enterprise.

A Changing Research Ecosystem

CryoCloud represents a shift in how scientists conduct research and education. Researchers are learning and developing the tools necessary to fully harness the big data revolution. Small, siloed research groups are transforming into transdisciplinary research networks. CryoCloud makes these shifts possible through a community-empowered, self-sustaining ecosystem.

In this ecosystem, users and developers coproduce to enhance existing resources and tools and are committed to developing innovative approaches to research and education, maximizing every research dollar. The CryoCloud community framework reimagines research to make collaboration and accessibility keystones of scientific discovery and provides a blueprint for any scientific community to transition to cloud-based computing.

Acknowledgments

We gratefully acknowledge Jessica Scheick, Wei Ji Leong, Scott Henderson, Fernando Pérez, Matthew Fisher, James Munroe, Yuvi Panda, Sarah Gibson, Erik Sundell, and Ellianna Abrahams for their contributions to the CryoCloud project. CryoCloud receives funding support from NASA’s Transform to Open Science program (grant 80NSSC23K0002) and the NASA Cryosphere Program and ICESat-2 Science Team (grant 80NSSC22K1877).

References

Fisher, M. (2023), Desktop GIS software in the cloud with JupyterHub, Medium, blog.jupyter.org/desktop-gis-software-in-the-cloud-with-jupyterhub-ddced297019a.

Mines Staff (2023), Tasha Snow receives 2023 AGU Open Science Recognition Prize, Mines News Room, 14 Sept., minesnewsroom.com/news/tasha-snow-receives-2023-agu-open-science-recognition-prize.

National Academies of Sciences, Engineering, and Medicine (2018), Thriving on Our Changing Planet: A Decadal Strategy for Earth Observation from Space, Natl. Acad. Press, Washington, D.C., https://doi.org/10.17226/24938.

Shean, D., et al. (2023), SlideRule: Enabling rapid, scalable, open science for the NASA ICESat-2 mission and beyond, J. Open Source Software, 8(81), 4982, https://doi.org/10.21105/joss.04982.

Snow, T. (2023), Landsat SST algorithm, Zenodo, https://doi.org/10.5281/zenodo.8240320.

Snow, T., et al. (2023), CryoCloud JupyterBook (2023.01.26), Zenodo, https://doi.org/10.5281/zenodo.7576602.

Wong, J. (2024), Keeping PACE with GPU enabled compute to detect global cloud cover using satellite data, 2i2c blog, 2i2c.org/blog/2024/pace-hackweek/.

Author Information

Wilson Sauthoff (sauthoff@mines.edu), Colorado School of Mines, Golden; Tasha Snow, NASA Goddard Space Flight Center, Greenbelt, Md.; also at Earth System Science Interdisciplinary Center, University of Maryland, College Park; Joanna D. Millstein, Colorado School of Mines, Golden; James Colliander, International Interactive Computing Collaboration and University of British Columbia, Vancouver, Canada; and Matthew R. Siegfried, Colorado School of Mines, Golden

Citation: Sauthoff, W., T. Snow, J. D. Millstein, J. Colliander, and M. R. Siegfried (2024), Democratizing science in the cloud, Eos, 105, https://doi.org/10.1029/2024EO240385. Published on 30 August 2024.
This article does not represent the opinion of AGU, Eos, or any of its affiliates. It is solely the opinion of the author(s).
Text © 2024. The authors. CC BY-NC-ND 3.0
Except where otherwise noted, images are subject to copyright. Any reuse without express permission from the copyright owner is prohibited.