Skip to content

Using Kerchunk to combine datasets into datatrees #357

Answered by martindurant
abkfenris asked this question in Q&A
Discussion options

You must be logged in to vote

This is definitely something that kerchunk can do, but not with MultiZarrToZarr. Since the structure of keys in a single zarr follows the filesystem hierarchy, it is enough to rename the keys to the desired new hierarchy. For example, something like the following would do:

def merge_to_path(treedict):
    out = {}
    for prefix, refpath in treedict.items():
        refs = ujson.load(open(refpath)) # or allow for remote paths here
        for k, v in refs["refs"].items():
            out[f"{prefix}/{k}"] = v
    return out

This makes a dictionary of everything, which you can then save; else you could choose to fill in a Lazy/parquet mapper, or a number of more complicated scenarios. The …

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@abkfenris
Comment options

Answer selected by abkfenris
Comment options

You must be logged in to vote
1 reply
@martindurant
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants