nc2zarr

A Python tool that converts multiple NetCDF files to single Zarr datasets.

Create Python environment

$ conda install -n base -c conda-forge mamba
$ cd nc2zarr
$ mamba env create

Install nc2zarr from Sources

$ cd nc2zarr
$ conda activate nc2zarr
$ python setup.py develop

Testing and Test Coverage

$ pytest --cov nc2zarr --cov-report=html tests

Usage

$ nc2zarr --help
Usage: nc2zarr [OPTIONS] [INPUT_FILE ...]

  Reads one or more input datasets and writes or appends them to a single
  Zarr output dataset.

  INPUT_FILE may refer to a NetCDF file, or Zarr dataset, or a glob that
  identifies multiple paths, e.g. "L3_SST/**/*.nc".

  OUTPUT_PATH must be directory which will contain the output Zarr dataset,
  e.g. "L3_SST.zarr".

  CONFIG_FILE must be in YAML format. It comprises the optional objects
  "input", "process", and "output". See nc2zarr/res/config-template.yml for
  a template file that describes the format. Multiple --config options may
  be passed as a chain to allow for reuse of credentials and other common
  parameters. Contained configuration objects are recursively merged, lists
  are appended, and other values overwrite each other from left to right.
  For example:

  nc2zarr -c s3.yml -c common.yml -c inputs-01.yml -o out-01.zarr
  nc2zarr -c s3.yml -c common.yml -c inputs-02.yml -o out-02.zarr
  nc2zarr out-01.zarr out-02.zarr -o final.zarr

  Command line arguments and options have precedence over other
  configurations and thus override settings in any CONFIG_FILE:

  [--finalize-only] overrides /finalize_only
  [--dry-run] overrides /dry_run
  [--verbose] overrides /verbosity

  [INPUT_FILE ...] overrides /input/paths in CONFIG_FILE
  [--multi-file] overrides /input/multi_file
  [--concat-dim] overrides /input/concat_dim
  [--decode-cf] overrides /input/decode_cf
  [--sort-by] overrides /input/sort_by

  [--output OUTPUT_FILE] overrides /output/path
  [--overwrite] overrides /output/overwrite
  [--append] overrides /output/append
  [--adjust-metadata] overrides /output/adjust_metadata

Options:
  -c, --config CONFIG_FILE   Configuration file (YAML). Multiple may be given.
  -o, --output OUTPUT_PATH   Output name. Defaults to "out.zarr".
  -d, --concat-dim DIM_NAME  Dimension for concatenation. Defaults to "time".
  -m, --multi-file           Open multiple input files as one block. Works for
                             NetCDF files only. Use --concat-dim to specify
                             the dimension for concatenation.

  -w, --overwrite            Overwrite existing OUTPUT_PATH. If OUTPUT_PATH
                             does not exist, the option has no effect. Cannot
                             be used with --append.

  -a, --append               Append inputs to existing OUTPUT_PATH. If
                             OUTPUT_PATH does not exist, the option has no
                             effect. Cannot be used with --overwrite.

  --decode-cf                Decode variables according to CF conventions.
                             Caution: array data may be converted to floating
                             point type if a "_FillValue" attribute is
                             present.

  -s, --sort-by [path|name]  Sort input files by specified property.
  --adjust-metadata          Adjust metadata attributes after the last
                             write/append step.

  --finalize-only            Whether to just run "finalize" tasks on an
                             existing output dataset. Currently, this updates
                             the metadata only, given that configuration
                             output/adjust_metadata is set or output/metadata
                             is not empty. See also option --adjust-metadata.

  -d, --dry-run              Open and process inputs only, omit data writing.
  -v, --verbose              Print more output. Use twice for even more
                             output.

  --version                  Show version number and exit.
  --help                     Show this message and exit.

Configuration file format

The format of the configuration files passed via the --config option is described as a configuration template.

Examples

Convert multiple NetCDFs to single Zarr:

$ nc2zarr -o outputs/SST.zarr inputs/**/SST-*.nc

Append single NetCDF to an existing Zarr:

$ nc2zarr -a -o outputs/SST.zarr inputs/2020/SST-20200610.nc

Concatenate multiple Zarrs to a new Zarr:

$ nc2zarr -o outputs/SST.zarr outputs/SST-part1.zarr outputs/SST-part2.zarr

Append one Zarr to existing Zarr:

$ nc2zarr -a -o outputs/SST.zarr outputs/SST-part3.zarr

Custom processors

nc2zarr's built-in processors can be expanded with custom processors, Python functions which modify the dataset at particular points in the conversion pipeline. A processor function takes an xarray.Dataset as an argument and returns an xarray.Dataset as its result. A processor is specified in the configuration file as <MODULE_NAME>:<FUNCTION_NAME>, so for example the processor specification mymodule:myfunction could refer to a function defined in a file mymodule.py with the following contents:

def myfunction(dataset):
    dataset.attrs["greeting"] = "Hello world!"
    return dataset

This processor function adds a predefined attribute to the dataset (modifying it in-place), then returns the modified dataset.

There are three points at which processors may be run:

Section	Parameter name	When is the processor run?
`input`	`custom_preprocessor`	After variable selection
`process`	`custom_processor`	After variable renaming, before rechunking
`output`	`custom_postprocessor`	Before writing data

See the template configuration file for more details of syntax. The module is searched for on Python's current search path, so it will usually be necessary to ensure that the parent directories of all processor modules are listed in the PYTHONPATH environment variable, e.g. by executing

export PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}/path/to/module/directory/"

before running nc2zarr. See the Python documentation for more details on PYTHONPATH.

Name		Name	Last commit message	Last commit date
Latest commit History 328 Commits
examples		examples
nc2zarr		nc2zarr
tests		tests
.gitignore		.gitignore
CHANGES.md		CHANGES.md
LICENSE		LICENSE
README.md		README.md
appveyor.yml		appveyor.yml
environment.yml		environment.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nc2zarr

Create Python environment

Install nc2zarr from Sources

Testing and Test Coverage

Usage

Configuration file format

Examples

Custom processors

About

Releases 8

Packages

Contributors 5

Languages

License

bcdev/nc2zarr

Folders and files

Latest commit

History

Repository files navigation

nc2zarr

Create Python environment

Install nc2zarr from Sources

Testing and Test Coverage

Usage

Configuration file format

Examples

Custom processors

About

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 5

Languages

Packages