Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rest of gases #54

Merged
merged 56 commits into from
Jun 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
d32b125
Add NOAA surface flask SF6 download and processing
znichollscr May 21, 2024
ddeb8e2
Add NOAA HATS SF6 download and processing
znichollscr May 21, 2024
6f841b6
Start on SF6
znichollscr May 21, 2024
65cf1cf
Up to writing SF6 files
znichollscr May 22, 2024
e6d8e25
Update input4mips-validation repo
znichollscr May 22, 2024
afce790
Move observational network binning file requirements into function
znichollscr May 22, 2024
5cc3e1c
Switch to generic historical emissions for SF6-like gases [ci skip]
znichollscr May 22, 2024
c552159
Finish sorting out placeholders
znichollscr May 22, 2024
cb85dda
Get SF6-like crunching into pydoit setup
znichollscr May 22, 2024
5c41226
Start on CFC-11 [ci skip]
znichollscr May 22, 2024
3944400
Update expected outputs
znichollscr May 23, 2024
af1c388
mypy
znichollscr May 23, 2024
2a2284b
Add CFC12 and HFC134a
znichollscr May 23, 2024
5fd8431
mypy
znichollscr May 23, 2024
045ef3f
Update expected outputs
znichollscr May 23, 2024
a71ac01
Update input4mips-validation version
znichollscr May 23, 2024
cd8b2cf
Add more gases
znichollscr May 23, 2024
fa1dcd0
Get CFC114 writing
znichollscr May 23, 2024
6de202f
Get CFC115 writing
znichollscr May 23, 2024
a48b6b2
Get CH2CL2 writing
znichollscr May 23, 2024
c3f2835
Get ch3br writing
znichollscr May 23, 2024
ce423c5
Get ch3ccl3 writing [ci skip]
znichollscr May 23, 2024
f9bc147
Get ch3cl writing
znichollscr May 23, 2024
766f213
Get chcl3 writing
znichollscr May 23, 2024
ad28647
Get halon1211 writing [ci skip]
znichollscr May 23, 2024
f60b442
Increase tolerance
znichollscr May 23, 2024
78087a5
Get halon2402 writing
znichollscr May 23, 2024
d220ede
Get halon1301 writing
znichollscr May 23, 2024
c482c2c
Get hcfc141b writing
znichollscr May 23, 2024
f230586
Get hcfc142b writing
znichollscr May 23, 2024
41da91f
Get hcfc22 writing
znichollscr May 23, 2024
ee265d3
Get hfc125 writing
znichollscr May 23, 2024
7b019bc
Get hfc143a writing
znichollscr May 23, 2024
c42fd0a
Get hfc152a writing
znichollscr May 23, 2024
51fc377
Get hfc236fa writing
znichollscr May 23, 2024
536c97b
Get hfc23 writing
znichollscr May 23, 2024
6e9f5d5
Get hfc236fa writing
znichollscr May 23, 2024
f22dbf6
Get hfc32 writing
znichollscr May 23, 2024
06b80d2
Get hfc245fa and hfc365mfc writing
znichollscr May 23, 2024
fdd14a6
Get hfc4310mee, nf3 and so2f2 writing [ci skip]
znichollscr May 24, 2024
b5eea05
Add single gas run script
znichollscr May 24, 2024
537b445
Get c2f6 writing
znichollscr May 24, 2024
d540bcb
Get c3f8 writing
znichollscr May 24, 2024
06c007a
Get cc4f8 writing
znichollscr May 24, 2024
ee262fc
Get ccl4 and cf4 writing [ci skip]
znichollscr May 24, 2024
eb3e836
Get other CFCs writing
znichollscr May 24, 2024
99f87bd
Finish off equivalent species
znichollscr May 24, 2024
de27731
Add numpy check to CI
znichollscr Jun 20, 2024
64db0f6
Remove action check
znichollscr Jun 20, 2024
1859db3
Update to handle new data
znichollscr Jun 20, 2024
ce34f57
Update dependencies
znichollscr Jun 20, 2024
ee8246e
Alter tolerances
znichollscr Jun 20, 2024
bb0d9ef
Alter tolerance again
znichollscr Jun 20, 2024
510c979
Check source data licences
znichollscr Jun 20, 2024
3017aae
Merge branch 'other-gases' into rest-of-gases
znichollscr Jun 20, 2024
08b1f8d
Update after merge
znichollscr Jun 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ jobs:
python-version: "${{ matrix.python-version }}"
venv-id: "tests-${{ runner.os }}"
poetry-dependency-install-flags: "--all-extras --only 'main,tests,coverage'"
# TODO: change this probably to capture coverage from non-regression tests, ok for now
- name: Run regression tests relevant for coverage
run: |
poetry run python scripts/write-config.py
Expand Down
46 changes: 46 additions & 0 deletions LICENCES_SOURCES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
Notes on the licences of the sources we use.
This is not checked with CI or anything, so may not be up to date.

AGAGE:

- data policy: https://agage.mit.edu/data/use-agage-data
- offer co-authorship
- contact [email protected] to check


NOAA HATS:

- https://gml.noaa.gov/hats/hats_datause.html
- reciprocity, otherwise can be used


NOAA CCG:

- https://gml.noaa.gov/ccgg/data/datause.html
- reciprocity, otherwise can be used


EPICA:

- https://doi.pangaea.de/10.1594/PANGAEA.552232
- CC BY 3.0, can build upon
- https://creativecommons.org/licenses/by/3.0/

Law Dome:

- https://data.csiro.au/collection/csiro%3A37077v2
- CC BY 4.0
- https://creativecommons.org/licenses/by/4.0/

NEEM:

- https://doi.pangaea.de/10.1594/PANGAEA.899039
- CC BY 4.0
- https://creativecommons.org/licenses/by/4.0/

HadCRUT5:

- https://www.metoffice.gov.uk/hadobs/hadcrut5/
- Open Government 3 licence
- https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
- can use it for this, no worries
3,401 changes: 3,369 additions & 32 deletions dev-config.yaml

Large diffs are not rendered by default.

7 changes: 6 additions & 1 deletion dodo.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
from __future__ import annotations

import datetime as dt
import logging
import os
import time
from collections.abc import Iterable
Expand Down Expand Up @@ -57,7 +58,11 @@
See https://pydoit.org/configuration.html#configuration-at-dodo-py
"""

logger = setup_logging()
logger = setup_logging(
stdout_level=logging.WARNING,
log_file=os.environ.get("DOIT_LOG_FILE", f"doit_{RUN_ID}.log"),
file_level=logging.INFO,
)


def print_key_info() -> None:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@
raw = pd.read_csv(
config_retrieve.gggrn.raw_dir / filename,
skiprows=skiprows,
delim_whitespace=True,
sep=r"\s+",
)

unit = gas_units[gas]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.15.2
# jupytext_version: 1.16.1
# kernelspec:
# display_name: Python 3 (ipykernel)
# language: python
Expand Down
8 changes: 1 addition & 7 deletions notebooks/001y_process-noaa-data/0010_download.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,6 @@
# Download data from the [NOAA Global Monitoring Laboratory (GML) Carbon Cycle Greenhouse Gases (CCGG) research area](https://gml.noaa.gov/ccgg/flask.html), specifically the [data page](https://gml.noaa.gov/ccgg/data/).
#
# For simplicity, here we just refer to this as the NOAA network. This is sort of line with what is done in [Forster et al., 2023](https://essd.copernicus.org/articles/15/2295/2023/essd-15-2295-2023.pdf), who call it the "NOAA Global Monitoring Laboratory (GML)" (which appears to be the name of the top-level program). Puzzlingly, this network seems to also be referred to as the [Global Greenhouse Gas Reference Network (GGGRN)](https://gml.noaa.gov/ccgg/data/) (TODO: ask someone who knows what the difference between the acronyms is meant to mean).
#
# To-do:
#
# - read old global-mean processing (also called 0010 but in a different folder) and extract any insights from there
# - add in handling of in situ measurements too (in situ and flask measurements treated as different stations in M17)
# - parameterise notebook so we can do same for CH4, N2O and SF6 observations

# %% [markdown]
# ## Imports
Expand Down Expand Up @@ -51,7 +45,7 @@

# %% editable=true slideshow={"slide_type": ""} tags=["parameters"]
config_file: str = "../../dev-config-absolute.yaml" # config file
step_config_id: str = "n2o_hats" # config ID to select for this branch
step_config_id: str = "co2_in-situ" # config ID to select for this branch

# %% [markdown]
# ## Load config
Expand Down
27 changes: 25 additions & 2 deletions notebooks/001y_process-noaa-data/0011_extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@
from local.noaa_processing import (
read_noaa_flask_zip,
read_noaa_hats,
read_noaa_hats_combined,
read_noaa_hats_m2_and_pr1,
read_noaa_in_situ_zip,
)

Expand All @@ -46,7 +48,7 @@

# %% editable=true slideshow={"slide_type": ""} tags=["parameters"]
config_file: str = "../../dev-config-absolute.yaml" # config file
step_config_id: str = "n2o_hats" # config ID to select for this branch
step_config_id: str = "cfc11_hats" # config ID to select for this branch

# %% [markdown]
# ## Load config
Expand Down Expand Up @@ -87,7 +89,28 @@
print(df_months)

elif config_step.source == "hats":
df_months = read_noaa_hats(zf, gas=config_step.gas, source=config_step.source)
if config_step.gas in ("n2o", "ccl4", "cfc11", "cfc113", "cfc12", "sf6"):
df_months = read_noaa_hats_combined(
zf, gas=config_step.gas, source=config_step.source
)

elif config_step.gas in (
"c2f6",
"cf4",
"halon1301",
"hfc125",
"hfc143a",
"hfc236fa",
"hfc32",
"nf3",
"so2f2",
):
df_months = read_noaa_hats_m2_and_pr1(
zf, gas=config_step.gas, source=config_step.source
)

else:
df_months = read_noaa_hats(zf, gas=config_step.gas, source=config_step.source)

print("df_months")
print(df_months)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@

# %% editable=true slideshow={"slide_type": ""} tags=["parameters"]
config_file: str = "../../dev-config-absolute.yaml" # config file
step_config_id: str = "n2o" # config ID to select for this branch
step_config_id: str = "sf6" # config ID to select for this branch

# %% [markdown]
# ## Load config
Expand Down
45 changes: 40 additions & 5 deletions notebooks/001y_process-noaa-data/0013_process_in-situ.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,13 +85,48 @@
#
# Nice and easy as this data already has everything we need.

# %% editable=true slideshow={"slide_type": ""}
# %%
monthly_dfs_with_loc = df_months[PROCESSED_DATA_COLUMNS]

# %%
if config_step.step_config_id in ["co2", "ch4"]:
# There is one month where there is duplicate data for MKO,
# presumably from moving because of the fires.
# We deal with this here becuase it is such an edge case.
edge_case_year_month = (2023, 7)
edge_case_rows_select = (
(monthly_dfs_with_loc["year"] == edge_case_year_month[0])
& (monthly_dfs_with_loc["month"] == edge_case_year_month[1])
& (monthly_dfs_with_loc["site_code_filename"] == "mko")
)
edge_case_rows = monthly_dfs_with_loc[edge_case_rows_select]
exp_n_edge_case_rows = 2
assert edge_case_rows.shape[0] == exp_n_edge_case_rows

# Assume that a mean is fine, it seems justifiable in overall noise
# and not sure what else to do...
edge_case_row_new = (
edge_case_rows.groupby(list(set(edge_case_rows.columns) - {"value"}))
.mean()
.reset_index()
)

monthly_dfs_with_loc = pd.concat(
[monthly_dfs_with_loc[~edge_case_rows_select], edge_case_row_new]
)
monthly_dfs_with_loc[
(monthly_dfs_with_loc["year"] == edge_case_year_month[0])
& (monthly_dfs_with_loc["month"] == edge_case_year_month[1])
& (monthly_dfs_with_loc["site_code_filename"] == "mko")
]

# %% editable=true slideshow={"slide_type": ""}
duplicate_entries = monthly_dfs_with_loc[
["gas", "year", "month", "site_code_filename"]
][monthly_dfs_with_loc[["gas", "year", "month", "site_code_filename"]].duplicated()]
assert (
not monthly_dfs_with_loc[["gas", "year", "month", "site_code_filename"]]
.duplicated()
.any()
), "Duplicate entries for a station in a month"
duplicate_entries.shape[0] == 0
), f"Duplicate entries for a station in a month {duplicate_entries}"
monthly_dfs_with_loc

# %% editable=true slideshow={"slide_type": ""}
Expand Down
26 changes: 13 additions & 13 deletions notebooks/001y_process-noaa-data/0014_process_hats.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@

# %% editable=true slideshow={"slide_type": ""} tags=["parameters"]
config_file: str = "../../dev-config-absolute.yaml" # config file
step_config_id: str = "n2o" # config ID to select for this branch
step_config_id: str = "hfc134a" # config ID to select for this branch

# %% [markdown]
# ## Load config
Expand Down Expand Up @@ -104,7 +104,7 @@
# countries.columns.tolist()

# %%
colours = (
colours = tuple(
c
for c in [
"tab:blue",
Expand All @@ -121,7 +121,7 @@
"tab:cyan",
]
)
markers = (
markers = tuple(
m
for m in [
"o",
Expand All @@ -142,14 +142,14 @@
]
)

for station, station_df in tqdman.tqdm(
monthly_dfs_with_loc.groupby("site_code"), desc="Stations"
for i, (station, station_df) in tqdman.tqdm(
enumerate(monthly_dfs_with_loc.groupby("site_code")), desc="Stations"
):
print(station_df)

fig, axes = plt.subplots(ncols=2, figsize=(12, 4))
colour = next(colours)
marker = next(markers)
colour = colours[i % len(colours)]
marker = markers[i % len(colours)]

countries.plot(color="lightgray", ax=axes[0])

Expand Down Expand Up @@ -189,7 +189,7 @@

# %%
fig, axes = plt.subplots(ncols=2, figsize=(12, 4))
colours = (
colours = tuple(
c
for c in [
"tab:blue",
Expand All @@ -206,7 +206,7 @@
"tab:cyan",
]
)
markers = (
markers = tuple(
m
for m in [
"o",
Expand All @@ -229,11 +229,11 @@

countries.plot(color="lightgray", ax=axes[0])

for station, station_df in tqdman.tqdm(
monthly_dfs_with_loc.groupby("site_code"), desc="Stations"
for i, (station, station_df) in tqdman.tqdm(
enumerate(monthly_dfs_with_loc.groupby("site_code")), desc="Stations"
):
colour = next(colours)
marker = next(markers)
colour = colours[i % len(colours)]
marker = markers[i % len(colours)]

station_df[["longitude", "latitude"]].drop_duplicates().plot(
x="longitude",
Expand Down
15 changes: 8 additions & 7 deletions notebooks/001y_process-noaa-data/0019_noaa-network-overview.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,22 +58,23 @@

if config.ci:
to_show: tuple[tuple[str, str, str], ...] = (
# ("co2", "in-situ", "process_noaa_in_situ_data"),
# ("co2", "surface-flask", "process_noaa_surface_flask_data"),
("co2", "in-situ", "process_noaa_in_situ_data"),
("co2", "surface-flask", "process_noaa_surface_flask_data"),
("ch4", "in-situ", "process_noaa_in_situ_data"),
("ch4", "surface-flask", "process_noaa_surface_flask_data"),
("n2o", "surface-flask", "process_noaa_surface_flask_data"),
("n2o", "hats", "process_noaa_hats_data"),
("sf6", "hats", "process_noaa_hats_data"),
("cfc11", "hats", "process_noaa_hats_data"),
)
else:
to_show = (
# ("co2", "in-situ", "process_noaa_in_situ_data"),
# ("co2", "surface-flask", "process_noaa_surface_flask_data"),
("co2", "in-situ", "process_noaa_in_situ_data"),
("co2", "surface-flask", "process_noaa_surface_flask_data"),
("ch4", "in-situ", "process_noaa_in_situ_data"),
("ch4", "surface-flask", "process_noaa_surface_flask_data"),
("n2o", "surface-flask", "process_noaa_surface_flask_data"),
("n2o", "hats", "process_noaa_hats_data"),
# ("sf6", "surface-flask", "process_noaa_surface_flask_data"),
("sf6", "hats", "process_noaa_hats_data"),
("cfc11", "hats", "process_noaa_hats_data"),
)

gas_configs = {
Expand Down
2 changes: 1 addition & 1 deletion notebooks/002y_process-agage-data/0020_download-agage.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@

# %% editable=true slideshow={"slide_type": ""} tags=["parameters"]
config_file: str = "../../dev-config-absolute.yaml" # config file
step_config_id: str = "ccl4_gc-md_monthly" # config ID to select for this branch
step_config_id: str = "sf6_gc-ms-medusa_monthly" # config ID to select for this branch

# %% [markdown]
# ## Load config
Expand Down
Loading
Loading