The Earth System Data Science initiative aims to profoundly increase the effectiveness of the NCAR/UCAR workforce by promoting deeper collaboration centered on analytics, improving our capacity to deliver impactful, actionable, reproducible science and serve the university community by transforming how geoscientists synthesize and extract information from large, diverse data sets.
Effective synthesis and analysis of large datasets is a rate-limiting step to advancing science across NCAR/UCAR and our community. Recent developments in open-source scientific software, notably those identified by the Pangeo community, provide both inspiring technical solutions to Big Data geoscience problems and paradigms for large-scale collaboration. Analysis workflows across NCAR/UCAR share much in common, despite disciplinary differences; thus, a focus on data and its transformation into useful information illuminates the potential for novel collaborations across the organization. Moreover, open, collaborative development focused on reproducible science holds transformative potential for how NCAR/UCAR and the community approaches science.
Collaboration. ESDS explicitly seeks to build collaborative networks oriented toward solving Big Data analysis challenges. Many of our scientific staff spend large fractions of their time writing analysis codes. Agreeing to work together on the development and application of a common set of tools will yield increasing returns in the form of shared knowledge and new opportunities for sustainable community projects.
Inclusivity. ESDS will foster a welcoming culture, promoting open communication and proactively encouraging engagement and contributions from across NCAR/UCAR and the university community. We recognize that data science and software engineering skills constitute a major barrier to entry in our research disciplines, which limits the potential diversity of our community. ESDS will explicitly develop training opportunities, improving the technical expertise of our staff and identifying strategies to entrain underrepresented groups, including at the undergraduate and graduate level.
Open Development. Science at NCAR/UCAR is data intensive. We require sustained innovation in our ability to efficiently extract useful information from large, diverse data sets—thus software development is an intrinsic component of modern science. Open Development actively engages the target software user community in the development, support, maintenance, and documentation of software products. Open Development has proven effective at producing critical software tools—notably by adopting a particular, structured paradigm for collaboration. The primary goal of ESDS is to effect a cultural transformation at NCAR/UCAR centered around this Open Development paradigm, enabling us to work together in fundamentally new ways and promoting more extensive collaboration on analysis tools.
Reproducible Science. Reproducible Science has the potential to be a transformative paradigm, helping to effect a transition toward making science actionable. Scientific papers remain the primary currency of scientific discourse, yet this model is deficient in several respects. Most significantly, papers are not easily extensible; they comprise a version of knowledge rendered into a static narrative. Software and its documentation, by contrast, encapsulate essential scientific knowledge in executable form, intrinsically embracing iterative refinement as fundamental, and enabling reproducibility, which is a cornerstone of best practices in computational science. Moreover, software yields a reusable product, opening opportunities for integration and extension of applications to address stakeholder requirements, including beyond our traditional research communities. This is critical in the context of NCAR/UCAR’s mission to translate science into actionable, societally-relevant information.
- Promote partnerships between software engineers and scientists to drive collaborative, Open Development of reusable geoscience workflows, novel approaches in machine learning, and Reproducible Science.
- Increase the literacy of the NCAR/UCAR workforce with the best practices supporting the development and application of scalable analysis workflows as well as cultural principles underpinning a capacity for sustained innovation in the context of emerging disruptive computing technology; explicitly engage with the university community on these efforts, and promote broadening participation.
- Promote more extensive communication and sharing of workflows between scientists, leading to improved productivity, collaboration, and increased potential for efforts culminating in collective products.
- Coordinate and improve deployments of interactive computing environments and analysis-ready datasets.
- Identify and interlink existing initiatives related to ESDS across NCAR/UCAR.