`zarr.core.Array.chunk_store` changes type after `zarr=2.23.3` (after PR 1304) #1362

valeriupredoi · 2023-03-06T12:57:05Z

Hey guys, just a heads up that the chunk_store object resulted from calling the chunk_store() method on a Zarr array changes type for objects (Zarr arrays) created from loading files with fsspec after zarr=2.13.3, ie after PR #1304 was merged:

with 2.13.3 my chunk_store (see below for reproducible code) is KVStore: <fsspec.mapping.FSMap object at 0x7feba08afee0> at 0x7feba08aff40>
with 2.13.6 same chunk_store object is <zarr.storage.FSStore object at 0x7fd3ea52bf40>

This causes issues when one wants to call properties of the KVStore object like _mutable_mapping - if this is intended please provide both a mention in the release notes (as a breaking change) and a solution how to still be able to use _mutable_mapping, if it's a bug - then you know what to do 😁 🍺

Here's my minimal code - like super minimal 😁 - and kudos to Zarr for preserving structure in such a Null case

import fsspec
import zarr

def open_zarr_group():
    url = fsspec.filesystem("reference", fo="test.json")
    mapper = url.get_mapper("")
    zarr_group = zarr.open_group(mapper)
    print(zarr_group)
    print(zarr_group.chunk_store)


if __name__ == '__main__':
    open_zarr_group()

where test.json can be an empty JSON file eg {}

Let me know what I can do to help BTW, and many thanks for all your good work! 🍻

The text was updated successfully, but these errors were encountered:

rabernat · 2023-03-06T13:18:52Z

Hi @valeriupredoi - thanks for reporting this potential bug! It looks like #1304 may have caused more problems than it fixed. 🤦

We'd love to be able to act on this, but I would really like to first encourage you to make your bug report more minimal. It would be great if you could remove the references to kerchunk, local file paths, etc. in your example. Here is a great guide for how to do this.

valeriupredoi · 2023-03-06T13:39:10Z

@rabernat cheers muchly! Absolutely, will do that in a sec - for now we've pinned zarr so no rush at our end, but will try minimize the test for you guys to be able to reproduce the issue w/o extra dependencies and stuff 🍺

valeriupredoi · 2023-03-06T14:43:37Z

@rabernat I have now edited the minimal test case - my apologies, I didn't know Zarr is so flexible so that it really doesn't need any data whatsoever (all can be mocked), with the functional types and structure still preserved - well done! 🍻

rabernat · 2023-03-06T15:18:56Z

Hi @valeriupredoi - thanks for updating your example! However, it is still not reproducible.

When I tried to run your code, I got the following error:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[1], line 13
      9     print(zarr_group.chunk_store)
     12 if __name__ == '__main__':
---> 13     open_zarr_group()

Cell In[1], line 5, in open_zarr_group()
      4 def open_zarr_group():
----> 5     url = fsspec.filesystem("reference", fo="test.json")
      6     mapper = url.get_mapper("")
      7     zarr_group = zarr.open_group(mapper)

...

FileNotFoundError: [Errno 2] No such file or directory: '/home/rabernat/test/test.json'

valeriupredoi · 2023-03-06T15:21:30Z

hi @rabernat apols, I mentioned in the description but it's not clear - you'll have to create an empty test.json ie empty by JSON standards, just add {} in the file 👍

rabernat · 2023-03-06T15:25:06Z

I tried this:

import json

def open_zarr_group():
    with open("test.json", mode="w") as fp:
        json.dump({}, fp)
    url = fsspec.filesystem("reference", fo="test.json")
    mapper = url.get_mapper("")
    zarr_group = zarr.open_group(mapper)
    print(zarr_group)
    print(zarr_group.chunk_store)

And got the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 18
     13     print(zarr_group)
     14     print(zarr_group.chunk_store)
---> 18 open_zarr_group()

Cell In[6], line 12, in open_zarr_group()
     10 url = fsspec.filesystem("reference", fo="test.json")
     11 mapper = url.get_mapper("")
---> 12 zarr_group = zarr.open_group(mapper)
     13 print(zarr_group)
     14 print(zarr_group.chunk_store)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/zarr/hierarchy.py:1465, in open_group(store, mode, cache_attrs, synchronizer, path, chunk_store, storage_options, zarr_version, meta_array)
   1462 # determine read only status
   1463 read_only = mode == 'r'
-> 1465 return Group(store, read_only=read_only, cache_attrs=cache_attrs,
   1466              synchronizer=synchronizer, path=path, chunk_store=chunk_store,
   1467              zarr_version=zarr_version, meta_array=meta_array)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/zarr/hierarchy.py:164, in Group.__init__(self, store, path, read_only, chunk_store, cache_attrs, synchronizer, zarr_version, meta_array)
    162     mkey = _prefix_to_group_key(self._store, self._key_prefix)
    163     assert not mkey.endswith("root/.group")
--> 164     meta_bytes = store[mkey]
    165 except KeyError:
    166     if self._version == 2:

File /srv/conda/envs/notebook/lib/python3.10/site-packages/zarr/storage.py:1393, in FSStore.__getitem__(self, key)
   1391 key = self._normalize_key(key)
   1392 try:
-> 1393     return self.map[key]
   1394 except self.exceptions as e:
   1395     raise KeyError(key) from e

File /srv/conda/envs/notebook/lib/python3.10/site-packages/fsspec/mapping.py:143, in FSMap.__getitem__(self, key, default)
    141 k = self._key_to_str(key)
    142 try:
--> 143     result = self.fs.cat(k)
    144 except self.missing_exceptions:
    145     if default is not None:

File /srv/conda/envs/notebook/lib/python3.10/site-packages/fsspec/implementations/reference.py:313, in ReferenceFileSystem.cat(self, path, recursive, on_error, **kwargs)
    311 for proto, paths in proto_dict.items():
    312     fs = self.fss[proto]
--> 313     urls, starts, ends = zip(*[self._cat_common(p) for p in paths])
    314     urls2 = []
    315     starts2 = []

File /srv/conda/envs/notebook/lib/python3.10/site-packages/fsspec/implementations/reference.py:313, in <listcomp>(.0)
    311 for proto, paths in proto_dict.items():
    312     fs = self.fss[proto]
--> 313     urls, starts, ends = zip(*[self._cat_common(p) for p in paths])
    314     urls2 = []
    315     starts2 = []

File /srv/conda/envs/notebook/lib/python3.10/site-packages/fsspec/implementations/reference.py:230, in ReferenceFileSystem._cat_common(self, path, start, end)
    228     start1, end1 = start, end
    229 else:
--> 230     url, start0, size = part
    231     logger.debug(f"Reference: {path} => {url}, offset {start0}, size {size}")
    232     end0 = start0 + size

ValueError: too many values to unpack (expected 3)

rabernat · 2023-03-06T15:30:23Z

Does this example really need the reference filesystem? Here is one without it

import fsspec
import zarr

fs = fsspec.filesystem("file")
mapper = fs.get_mapper("tmp.zarr")
zarr_group = zarr.open_group(mapper)
print(zarr_group)
# <zarr.hierarchy.Group '/'>
print(zarr_group.chunk_store)
# <zarr.storage.FSStore object at 0x7fe3b15dbfd0>

Does this reproduce the problem?

valeriupredoi · 2023-03-06T15:32:28Z

Does this example really need the reference filesystem? Here is one without it

import fsspec
import zarr

fs = fsspec.filesystem("file")
mapper = fs.get_mapper("tmp.zarr")
zarr_group = zarr.open_group(mapper)
print(zarr_group)
# <zarr.hierarchy.Group '/'>
print(zarr_group.chunk_store)
# <zarr.storage.FSStore object at 0x7fe3b15dbfd0>

Does this reproduce the problem?

Indeed that should work! I tried the code in your previous comment (whereby the json file is constructed with the context manager and that worked for me...), but that's true, I don't think one needs the ref FS

rabernat · 2023-03-06T15:36:13Z

Ok, glad we found a truly minimum reproducer! 😊

Now, can you help me understand what exactly is the problem? The FSMap has correctly been promoted to an FSStore. This object will give much better performance on concurrent read / write operations, as discussed in #1296.

Can you explain why the FSStore object is problematic for you?

valeriupredoi · 2023-03-06T15:43:31Z

Ok, glad we found a truly minimum reproducer! blush

and a really nice proof Zarr supports Null testing nicely 😁

Can you explain why the FSStore object is problematic for you?

We are treating and using chunk_store as a KVStore so we are using one of its methods _mutable_mapping, which is no longer an attribute of it anymore, see eg test fail: AttributeError: 'FSStore' object has no attribute '_mutable_mapping' - I didn't know this change was as per intended, and it'd be nice if this was communicated in the release notes/changelog together with a suggestion how to transfer to new method/API

rabernat · 2023-03-06T15:47:15Z

we are using one of its methods _mutable_mapping

Ok I understand. Thanks for re-explaining!

In general, FSStore and FSMap implement a very similar public interface. They are both MutableMappings, so you can treat them as dictionaries and use getitem / setitem syntax on them. FSStore has a few public methods that FSMap does not, like getitems and setitems.

we are using one of its methods _mutable_mapping,

._mutable_mapping is a private attribute. As developers, we do not make any guarantees about the stability of private attributes. They may change or disappear at any time. So I would suggest that you do not rely on them in your user code.

Can you explain why you were using the _mutable_mapping attribute? Maybe we can find a workaround.

valeriupredoi · 2023-03-06T16:11:13Z

@rabernat that's a very good question - as I understand it, we use it to get the mapping of the input netCDF4 files we input onto the Zarr format, so a full image of the data and metadata as Zarr would see it:

fsref = self.zds.chunk_store._mutable_mapping.fs.references

where self.zds is a Zarr array -

ie the FS reference for the file. My colleague @bnlawrence can tell you more since he's implemented this method, but that should be the gist of it. If there is any way to grab this dictionary from FSStore, then we'd gladly use that, especially if that was a public function 👍

rabernat · 2023-03-06T16:27:14Z

You should be able to access the underlying fsspec filesystem object via the .fs attribute of the FSStore object. So zarr_group.chunk_store.fs.references should work.

Looking at the source code is probably helpful:

zarr-python/zarr/storage.py

Line 1278 in 5ece3e6

class FSStore(Store):

👋 @bnlawrence! Lovely to see you around these parts. I'd love to learn what you are up to and how we might collaborate around Zarr / kerchunk / etc.

valeriupredoi · 2023-03-06T16:32:56Z

cheers @rabernat - I'll have a look, many thanks for your help so far 🍺

valeriupredoi · 2023-03-06T16:37:19Z

whoa! Proves out in the new configuration one doesn't need to access the _mutable_mapping method after all, this does it perfectly:

fsref = self.zds.chunk_store.fs.references

Cheers, @rabernat 🍺 I'll keep this open if you'd still need it for bookkeeping and/or @bnlawrence's reply 😁

rabernat · 2023-03-06T16:45:55Z

We can close the issue but feel free to keep the conversation going!

bnlawrence · 2023-03-17T07:19:46Z

Ah sorry @rabernat, @valeriupredoi has just pointed this thread out to me - I have to admit to mostly silencing my github notifications as I can't cope with the information onslaught.

For what it's worth, this is part of building a tool to allow the file system (or the S3 server) to do basic reductions on chunks rather than do full chunk loads (to avoid data movement). We have four pieces in flight at the moment: a Python client application (cf-python of course), some Python middleware (that uses this piece of the zarr ecosystem), a Posix server, and an S3 proxy server ... but until we put them all together (due in Q2/Q3 this year), there's not much to show or talk about.

We absolutely intend the middleware to be reusable by any client application/library (including xarray), but until we have it working enough (i.e. with the storage clients as well) to sell the idea, we're not talking much :-)

rabernat · 2023-03-17T11:48:29Z

Very interesting. I would check out https://zarr.dev/zeps/draft/ZEP0005.html, by @hailiangzhang and colleagues at NASA Goddard. It aims to accomplish a very similar thing via a Zarr spec extension. Discussion at zarr-developers/zarr-specs#205

hailiangzhang · 2023-03-17T15:25:34Z

Thanks for referring to my proposal @rabernat !
@bnlawrence , this proposal introduces a small and adjustable extra storage, but should make the large-volume data reduction/aggregation service much faster and cheaper since we only need to extract the chunks at the boundary. If this appears to be related to what you are trying to achieve, feel free to join the Zarr community call next Wed when I plan to have a walkthrough on this. Thanks!

bnlawrence · 2023-03-18T09:22:21Z

Our approach is to push the computation of partial sums etc into the storage itself (which has plenty of compute for such tasks, given that it can do erasure coding etc). What is probably in common with our two approaches is how we would want to tell Dask when it can avoid having to do these calculations itself. @valeriupredoi Perhaps you or @davidhassell could see if you can join that call? If not, maybe we can set something specific up for a conversation - it is a bit niche :-)

hailiangzhang · 2023-03-18T20:59:17Z

Our approach is to push the computation of partial sums etc into the storage itself (which has plenty of compute for such tasks, given that it can do erasure coding etc). What is probably in common with our two approaches is how we would want to tell Dask when it can avoid having to do these calculations itself. @valeriupredoi Perhaps you or @davidhassell could see if you can join that call? If not, maybe we can set something specific up for a conversation - it is a bit niche :-)

Thanks for your info @bnlawrence ! Yes, there may be something in common between our approaches (at least it looks like we are trying to achieve similar goal:) Do you happen to have any links/resources describing more about your approach?

valeriupredoi mentioned this issue Mar 6, 2023

Unit tests failing (OK for now after pinning Zarr) NCAS-CMS/PyActiveStorage#62

Closed

rabernat closed this as completed Mar 6, 2023

This was referenced Mar 6, 2023

Change of interface from KVStore to FSStore NCAS-CMS/PyActiveStorage#64

Open

Pin Zarr >=2.13.6 so we use new FSStore objects (instead of KVStore) NCAS-CMS/PyActiveStorage#65

Merged

rabernat mentioned this issue Apr 6, 2023

ReadOnlyError when opening a v3 store #1382

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`zarr.core.Array.chunk_store` changes type after `zarr=2.23.3` (after PR 1304) #1362

`zarr.core.Array.chunk_store` changes type after `zarr=2.23.3` (after PR 1304) #1362

valeriupredoi commented Mar 6, 2023 •

edited

Loading

rabernat commented Mar 6, 2023

valeriupredoi commented Mar 6, 2023

valeriupredoi commented Mar 6, 2023

rabernat commented Mar 6, 2023

valeriupredoi commented Mar 6, 2023

rabernat commented Mar 6, 2023

rabernat commented Mar 6, 2023 •

edited

Loading

valeriupredoi commented Mar 6, 2023

rabernat commented Mar 6, 2023

valeriupredoi commented Mar 6, 2023 •

edited

Loading

rabernat commented Mar 6, 2023

valeriupredoi commented Mar 6, 2023 •

edited

Loading

rabernat commented Mar 6, 2023

valeriupredoi commented Mar 6, 2023

valeriupredoi commented Mar 6, 2023

rabernat commented Mar 6, 2023

bnlawrence commented Mar 17, 2023 •

edited

Loading

rabernat commented Mar 17, 2023

hailiangzhang commented Mar 17, 2023

bnlawrence commented Mar 18, 2023

hailiangzhang commented Mar 18, 2023

zarr.core.Array.chunk_store changes type after zarr=2.23.3 (after PR 1304) #1362

zarr.core.Array.chunk_store changes type after zarr=2.23.3 (after PR 1304) #1362

Comments

valeriupredoi commented Mar 6, 2023 • edited Loading

rabernat commented Mar 6, 2023

valeriupredoi commented Mar 6, 2023

valeriupredoi commented Mar 6, 2023

rabernat commented Mar 6, 2023

valeriupredoi commented Mar 6, 2023

rabernat commented Mar 6, 2023

rabernat commented Mar 6, 2023 • edited Loading

valeriupredoi commented Mar 6, 2023

rabernat commented Mar 6, 2023

valeriupredoi commented Mar 6, 2023 • edited Loading

rabernat commented Mar 6, 2023

valeriupredoi commented Mar 6, 2023 • edited Loading

rabernat commented Mar 6, 2023

valeriupredoi commented Mar 6, 2023

valeriupredoi commented Mar 6, 2023

rabernat commented Mar 6, 2023

bnlawrence commented Mar 17, 2023 • edited Loading

rabernat commented Mar 17, 2023

hailiangzhang commented Mar 17, 2023

bnlawrence commented Mar 18, 2023

hailiangzhang commented Mar 18, 2023

`zarr.core.Array.chunk_store` changes type after `zarr=2.23.3` (after PR 1304) #1362

`zarr.core.Array.chunk_store` changes type after `zarr=2.23.3` (after PR 1304) #1362

valeriupredoi commented Mar 6, 2023 •

edited

Loading

rabernat commented Mar 6, 2023 •

edited

Loading

valeriupredoi commented Mar 6, 2023 •

edited

Loading

valeriupredoi commented Mar 6, 2023 •

edited

Loading

bnlawrence commented Mar 17, 2023 •

edited

Loading