Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update source code of ORE manuscript #585

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ repository: https://github.com/iamconsortium/pyam
version: 1.0
license: Apache-2.0
journal: Open Research Europe
doi: 10.12688/openreseurope.13633.1
doi: 10.12688/openreseurope.13633.2
authors:
- family-names: Huppmann
given-names: Daniel
Expand Down
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ pyam: analysis & visualization <br /> of integrated-assessment and macro-energy
[![ReadTheDocs](https://readthedocs.org/projects/pyam-iamc/badge/?version=latest)](https://pyam-iamc.readthedocs.io/en/latest/?badge=latest)
[![codecov](https://codecov.io/gh/IAMconsortium/pyam/branch/main/graph/badge.svg)](https://codecov.io/gh/IAMconsortium/pyam)

[![doi](https://zenodo.org/badge/113359260.svg)](https://zenodo.org/badge/latestdoi/113359260)
[![ORE](https://img.shields.io/badge/ORE-10.12688/openreseurope.13633.1-blue)](https://doi.org/10.12688/openreseurope.13633.1)
[![doi](https://zenodo.org/badge/113359260.svg)](https://doi.org/10.5281/zenodo.1470400)
[![ORE](https://img.shields.io/badge/ORE-10.12688/openreseurope.13633.2-blue)](https://doi.org/10.12688/openreseurope.13633.2)
[![joss](https://joss.theoj.org/papers/10.21105/joss.01095/status.svg)](https://joss.theoj.org/papers/10.21105/joss.01095)
[![groups.io](https://img.shields.io/badge/listserv-groups.io-blue)](https://pyam.groups.io/g/forum)
[![slack](https://img.shields.io/badge/chat-Slack-orange)](https://pyam-iamc.slack.com)
Expand Down Expand Up @@ -103,6 +103,10 @@ Scientific publications
The following manuscripts describe the **pyam** package
at specific stages of development.

The source documents are available in
the [manuscripts](https://github.com/IAMconsortium/pyam/tree/main/manuscripts) folder
of the GitHub repository.

### Release v1.0 (June 2021)

Published to mark the first major release of the **pyam** package.
Expand All @@ -113,7 +117,7 @@ Maarten Brinkerink, Maik Budzinski, Florian Maczek, Sebastian Zwickl-Bernhard,
Lara Welder, Erik Francisco Alvarez Quispe, and Christopher J. Smith.
*pyam: Analysis and visualisation of integrated assessment and macro-energy scenarios.*
**Open Research Europe**, 2021.
doi: [10.12688/openreseurope.13633.1](https://doi.org/10.12688/openreseurope.13633.1)
doi: [10.12688/openreseurope.13633.2](https://doi.org/10.12688/openreseurope.13633.2)

### Release v0.1.2 (November 2018)

Expand Down
1 change: 1 addition & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ The dependencies were updated to require `xlrd>=2.0` (previously `<2.0`) and `op

## Individual updates

- [#585](https://github.com/IAMconsortium/pyam/pull/585) Include revisions to the ORE manuscript source code following acceptance/publication
- [#583](https://github.com/IAMconsortium/pyam/pull/583) Add profiler module for performance benchmarking
- [#579](https://github.com/IAMconsortium/pyam/pull/579) Increase performance of IamDataFrame initialization
- [#572](https://github.com/IAMconsortium/pyam/pull/572) Unpinned the requirements for xlrd and added openpyxl as a requirement to ensure ongoing support of both `.xlsx` and `.xls` files out of the box
Expand Down
14 changes: 9 additions & 5 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,10 @@ Release v\ |version|.
:target: https://codecov.io/gh/IAMconsortium/pyam

.. |doi| image:: https://zenodo.org/badge/113359260.svg
:target: https://zenodo.org/badge/latestdoi/113359260
:target: https://doi.org/10.5281/zenodo.1470400

.. |ore| image:: https://img.shields.io/badge/ORE-10.12688/openreseurope.13633.1-blue
:target: https://doi.org/10.12688/openreseurope.13633.1
.. |ore| image:: https://img.shields.io/badge/ORE-10.12688/openreseurope.13633.2-blue
:target: https://doi.org/10.12688/openreseurope.13633.2

.. |joss| image:: https://joss.theoj.org/papers/10.21105/joss.01095/status.svg
:target: https://joss.theoj.org/papers/10.21105/joss.01095
Expand Down Expand Up @@ -129,6 +129,10 @@ Scientific publications

The following manuscripts describe the package at specific stages of development.

The source documents are available in the manuscripts_ folder of the GitHub repository.

.. _manuscripts: https://github.com/IAMconsortium/pyam/tree/main/manuscripts

Release v1.0 (June 2021)
~~~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -142,12 +146,12 @@ Published to mark the first major release of the |pyam| package.
Lara Welder, Erik Francisco Álvarez Quispe, and Christopher J. Smith.
| *pyam: Analysis and visualisation of integrated assessment and macro-energy scenarios.*
| **Open Research Europe**, 2021.
doi: `10.12688/openreseurope.13633.1 <https://doi.org/10.12688/openreseurope.13633.1>`_
doi: `10.12688/openreseurope.13633.2 <https://doi.org/10.12688/openreseurope.13633.2>`_

Release v0.1.2 (November 2018)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Published following the successful application of **pyam**
Published following the successful application of |pyam|
in the IPCC SR15 and the Horizon 2020 CRESCENDO project.

.. highlights::
Expand Down
26 changes: 22 additions & 4 deletions manuscripts/ORE/source/chapters/datamodels.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,21 @@
Data models and formats used by the energy & climate modelling communities
==========================================================================

When researchers in the domain of energy modelling and climate science hear the term
"model", they usually think of numerical tools to compute results from given inputs.
This section is about a different type of model.

A "data model" is an abstract description of the structure of information.
Numerous concepts are in use in the domain of integrated assessment,
energy systems modelling and climate science.
This section describes several commonly used concepts in the integrated-assessment
community as well as energy systems, macro-energy and climate modelling.
It can refer to timeseries data, static characteristics of technologies or resources,
or any other numerical information. In its essence, a table with clear rules
on the kind of values in each column is already a data model.

Numerous concepts are in use in the domain of energy systems modelling and
climate science to store reference data, facilitate exchange of data between models,
or make results available to other users.
This section describes commonly used data models and related formats in the
integrated-assessment community as well as the domain of energy systems,
macro-energy, and climate modelling.

The IAMC format
---------------
Expand Down Expand Up @@ -128,6 +138,9 @@ lazy data handling.
It should be noted that the ESMValTool supports programming languages other than Python,
with the aim of being as open as possible.

Bridging the gap between integrated assessment and climate science
------------------------------------------------------------------

Beyond the CMIP archive, there are a myriad of other data formats and conventions
within the climate literature.
Of these, the most relevant to the integrated-assessment community is
Expand All @@ -142,3 +155,8 @@ the assessment by Working Group 3 of the IPCC.
To extract data from the CMIP archive into the scmdata format,
the package `netCDF-SCM <https://gitlab.com/netcdf-scm/netcdf-scm>`_ was developed
:cite:`Nicholls:2021:CMIPdata`.

The pyam package was initiated based on the IAMC format and the work done to foster the
link between the integrated-assessment community and the climate sciences.
The following section describes the design principles of the package and
the generalized data model for which it can be applied.
156 changes: 95 additions & 61 deletions manuscripts/ORE/source/chapters/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,16 +39,16 @@ for energy systems modelling
as well as integrated assessment of climate change and sustainable development
to allow sensible defaults and remove as much clutter as possible from
scenario processing workflows or analysis scripts.
Using a standardized, well-structured toolbox rather than own custom methods
can also reduce the scope for errors and improve the reliability and readability
of scenario processing code.

An overview of existing packages and tools
------------------------------------------

Several open-source packages and tools exist in between the general-purpose packages
for data analysis and plotting, on the one hand, and dedicated data processing solutions
specifically built around a specific modelling framework, on the other,
see :numref:`overview`.
These packages are compatible with a variety of data formats
commonly used in energy systems modelling and integrated assessment.
specifically built around a specific modelling framework, on the other hand.

.. _overview:

Expand All @@ -58,82 +58,116 @@ commonly used in energy systems modelling and integrated assessment.
Overview of packages & tools for energy system & integrated assessment modelling
(see the Appendix for a full list of references and links cited in this figure)

These packages can be grouped into four categories; we provide examples
in each category for illustrative purposes:

1. *Data processing, computation and validation of input data and scenario results*:

The R package `madrat <https://github.com/pik-piam/madrat>`_ provides a framework
for improving reproducibility and transparency in data processing
:cite:`Dietrich:2021:madrat`.

In comparison, the R package `iamc <https://github.com/iamconsortium/iamc>`_ is a collection
of functions for data analysis and diagnostics of scenario results in the IAMC format
(see the following section on data models for more information).

The Python package `genno <https://genno.readthedocs.io>`_ supports describing and
executing complex calculations on labelled, multi-dimensional data; it was developed
as a generalization of data processing in the context of integrated assessment and transport modelling.

2. *Visualization of scenario results in a domain-specific format*:

The R package `mipplot <https://github.com/UTokyo-mip/mipplot>`_ generates plots
from climate mitigation scenarios :cite:`Yiyi:2021:mipplot`.
It is also based on the IAMC format.

3. *Reference data management for model input & calibration*:

The Public Utility Data Liberation (`PUDL <https://catalyst.coop/pudl>`_) project
takes publicly available information and makes it usable by cleaning, standardizing,
and cross-linking utility data from different sources in a single database.

In a similar effort, `PowerGenome <https://github.com/PowerGenome/PowerGenome>`_
compiles different data sources into a single database.
The packages on the left-hand side of :numref:`overview` are powerful, general-purpose,
domain-agnostic solutions for data science.
In contrast, in the top-right corner is a selection of several widely used modelling
frameworks that come with dedicated analysis and visualization features "hard-wired"
to their implementation.

The `PowerSystems.jl <https://github.com/NREL-SIIP/PowerSystems.jl>`_ package
provides a rigorous data model to enable power systems analysis and modelling
across several input formats.

4. *Comprehensive database solutions for management of scenario input data and results*:

The `Open Energy Platform <https://openenergy-platform.org>`_ aims to ensure quality,
transparency and reproducibility in energy system research. It is a collaborative
community effort to develop various tools and information that help working
with energy-related data.

The `Spine Toolbox <https://spine-toolbox.readthedocs.io>`_ is a modular and
adaptable end-to-end energy modelling ecosystem to enable open, practical, flexible
and realistic planning of European energy grids.

The pyam package covers both the data processing and validation aspects (category 1)
as well as a suite of plotting features (category 2).
It also provides direct interfaces to reference data sources (category 3)
and can be integrated with existing community database solutions (category 4).
Due to this wide scope, it is a novel and - we hope - useful addition
to the suite of tools used by the energy systems and integrated-assessment communities.
In the middle of the figure are several packages and tools that are not customized
to any particular modelling framework, but are geared for broader use in the domain
of energy systems and integrated assessment modelling.
These packages are compatible with a variety of data formats
commonly used by the respective research communities.

The R package `madrat <https://github.com/pik-piam/madrat>`_ provides a framework
for improving reproducibility and transparency in data processing.
It enables the definition and execution of workflows that are frequently nused
in this research domain :cite:`Dietrich:2021:madrat`.
In comparison, the R package `iamc <https://github.com/iamconsortium/iamc>`_ is a collection
of functions for data analysis and diagnostics of scenario results in the IAMC format,
a domain-specific format widely used for climate mitigation scenarios.
(see the following section on data models for more information).
The Python package `genno <https://genno.readthedocs.io>`_ supports describing and
executing complex calculations on labelled, multi-dimensional data; it was developed
as a generalization of data processing in the context of integrated assessment
and transport modelling.

In contrast, the R package `mipplot <https://github.com/UTokyo-mip/mipplot>`_
is a solution for visualization of scenario results related
to climate mitigation :cite:`Yiyi:2021:mipplot`.
It is also based on the IAMC format.

The pyam package, similar to the pandas package in the general-purpose "column"
of the figure, provides features and methods both for data processing as well as for
visualization and plotting. It was developed specifically for supporting workflows
and conducting analysis for input data for and results from energy system models
like those shown in the top-right corner of the figure.

As one additional group of relevant packages for the energy systems modelling domain,
the figure shows several tools for reference data compilation (model input)
and storage of scenario results (model output):

The Public Utility Data Liberation (`PUDL <https://catalyst.coop/pudl>`_) project
takes publicly available information and makes it usable by cleaning, standardizing,
and cross-linking utility data from different sources in a single database.
In a similar effort, `PowerGenome <https://github.com/PowerGenome/PowerGenome>`_
compiles different data sources into a single database.
The `friendly_data <https://github.com/sentinel-energy/friendly_data>`_ package
implements an adaptation to make the frictionless datapackage standard
more easily usable in the energy systems domain.
The `PowerSystems.jl <https://github.com/NREL-SIIP/PowerSystems.jl>`_ package
provides a rigorous data model to enable power systems analysis and modelling
across several input formats.
The `Open Energy Platform <https://openenergy-platform.org>`_ aims to ensure quality,
transparency and reproducibility in energy system research. It is a collaborative
community effort to develop various tools that help working with and sharing
energy-related data across the entire modelling workflow.

These tools are valuable to facilitate the use of consistent data when calibrating
or evaluating models, and they simplify the process to share and compare results
across modelling frameworks.
Alas, these tools still suffer from fragmentation and incompatible data formats.
To integrate them with either a general-purpose data science package or
a specific modelling framework requires substantial effort.

A Python package for scenario analysis & visualization
------------------------------------------------------

We believe that pyam can serve a useful "bridge" between different modelling
frameworks, or between models and various data management solutions.
Due to its wide scope encompassing various aspects of data science and visualization
options, it can be a valuable addition to the to the suite of tools
used by the energy systems and integrated assessment modelling communities.

The pyam package grew out of complementary efforts in the Horizon 2020 project
`CRESCENDO <https://www.crescendoproject.eu>`_ and the analysis of integrated-assessment scenarios
supporting the IPCC's *Special Report on Global Warming of 1.5°C*.
Ref :cite:`Gidden:2019:pyam` describes an earlier version of its features and capabilities.
After three years of development, we believe that the package has now reached
a reasonable level of maturity to be useful to a wider audience -
An earlier manuscript describes its features and capabilities at that time
:cite:`Gidden:2019:pyam`.
After more than two years of further development, we believe that the package has now
reached a reasonable level of maturity to be useful to a wider audience -
in scientific-software jargon, it is ready for **release 1.0**.

The aim of the package is not to provide complex new methodologies
or sophisticated plotting features. Instead, the aim is to provide a toolbox
or sophisticated plotting features. Instead, the vision is to provide a toolbox
for many small operations and processing steps that a researcher or analyst frequently
needs when working with numerical scenarios of climate change mitigation
and the energy system transition:
aggregation & downscaling, unit conversion, validation,
and a simple plotting library to quickly get an intuition of the scenario data.

The package can be used for results generated from any model in the listed domains above
or related reference sources, if the data does have some sectoral, spatial and temporal
dimension.
While we use the term "timeseries" throughout this manuscript, pyam can handle data
that has only one level of regional and temporal resolution,
e.g., global CO2 emissions in one specific year.

By following the design of `pandas <https://pandas.pydata.org>`_ and other mature,
well-established packages, it can appeal to a broad range of user groups:

- Modelers generating scenario results using their own tools and frameworks,
as well as researchers and analysts working with existing scenario ensembles
such as those supporting the IPCC reports or produced in research projects.
- Users that want to add a particular step to an existing scenario processing workflow
as well as modelers that are starting scenario analysis from scratch.
- Python experts as well as novice users of this programming language.

This manuscript describes the design principles of the package
and the types of data that can be handled.
We present a number of features and recent applications
to illustrate the usefulness of pyam.
to illustrate the usefulness of pyam, and we point to the tutorials that
can help potential users to decide whether the pyam package may be suitable for them.
In the last section, we identify several forthcoming uses cases
and planned developments.
11 changes: 9 additions & 2 deletions manuscripts/ORE/source/chapters/outlook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,18 @@ respond to bug reports.
At the same time, there is an important role for (non-expert) users:
suggesting new features to improve the usefulness of the package,
contributing to the development of tutorials,
and answering questions from new users via the community Slack channel and mailing list.
and answering questions from new users via the community
`Slack channel <https://pyam-iamc.slack.com>`_ and
`mailing list <https://pyam.groups.io>`_.

By virtue of being applied in several ongoing Horizon 2020 projects and the IPCC AR6 process,
The just-starting Horizon 2020 project *European Climate and Energy Modelling Forum*
(`ECEMF <https://ecemf.eu>`_) will develop model linkages and tools
based on or compatible with the pyam package.
By virtue of being applied in this and several other ongoing Horizon 2020 projects
as well as the IPCC AR6 process,
we are confident that the package will attract new users and continuously evolve
to meet changing requirements for scenario analysis and data visualization.

At the same time, the solid foundation of continuous-integration workflows,
comprehensive test coverage and detailed documentation minimize the risk
of inadvertently breaking existing scripts and causing frustration amongst
Expand Down
Loading