Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rechunk Pressure level data in lat lon dataset #69

Open
loliverhennigh opened this issue Feb 29, 2024 · 2 comments
Open

Rechunk Pressure level data in lat lon dataset #69

loliverhennigh opened this issue Feb 29, 2024 · 2 comments

Comments

@loliverhennigh
Copy link

I have been using the latlon dataset here gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3. It has been extremely helpful for setting up different projects. I am wondering if it would be possible to rechunk the pressure level data. Currently all pressure levels are in a single chunk. If we want to sub sample we will end up getting the entire chunk which can significantly slow down the bandwidth. Ideally given this is in object storage we could use much smaller chunk sizes and just have the chunks be the lat long grid. What do you thinks?

@shoyer
Copy link
Collaborator

shoyer commented Mar 27, 2024

This is a lot of data, so I don't think we're going to store another duplicate version of this dataset. But there are a number of tools for rechunking the data yourself, e.g., see rechunker or xarray-beam

@MrTarantoga
Copy link

I have a similar issue. If I am only interested into a region of the world and a subset of the data, I would have to download 1PB for only a few hundred megabyte. I do not thinks that is the idea.

So there is not possibility to download only a subset of the remote chunk? We have always to download the full chunk and than rechunk local?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants