-
-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure zarr.create
uses writeable mode
#1309
Conversation
Can confirm the changes here also fix the related test failures we were seeing in Dask's CI (xref dask/dask#9736) |
Codecov Report
@@ Coverage Diff @@
## main #1309 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 35 35
Lines 14148 14157 +9
=========================================
+ Hits 14148 14157 +9
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @jrbourbeau for a quick fix of this regression.
I'm eager to help fix the problem. At the same time, I want to understand this bug a bit better in order to avoid creating additional technical debt within zarr-python.
When I first saw @djhoese's issue, I was puzzled by the recommend fix. The example was using local file storage. How is it possible that creating arrays in local files is completely broken in zarr? Surely our test suite would have caught such a regression.
I dug deeper and realized that dask.array.to_zarr is always converting string arguments to FSMap objects (see https://github.com/dask/dask/blob/40bc376acb1f631aa80db17ce4fb779e1a888aee/dask/array/core.py#L3677-L3681) using code like
if isinstance(url, str):
mapper = get_mapper(url, **storage_options)
FSMap has been effectively deprecated by the introduction of zarr.storage.FSStore (beginning with #546). FSStore has additional performance optimization that are not available with FSMap, and therefore we always prefer to use an FSStore (although an FSMap is still a valid store, as it is a valid MutableMapping).
To address this inconsistency, #1304 recently implemented a change to automatically promote an FSMap to an FSStore (thanks @ravwojdyla! 🙌 ). That change is what revealed this latent bug. Before, the FSMap was initialized with whatever mode the user chose (or the default from fsspec, presumably mode='w'
). Now we override this with the default keyword argument from normalize_store_arg
which is indeed mode='r'
:
Lines 144 to 147 in 4e633ad
if isinstance(store, fsspec.FSMap): | |
return FSStore(store.root, | |
fs=store.fs, | |
mode=mode, |
mode
is not an attribute of an FSMap, but it is an attribute of FSStore. It is used internally within Zarr to implement read-only behavior, e.g.
Lines 1409 to 1411 in 4e633ad
def __delitem__(self, key): | |
if self.mode == 'r': | |
raise ReadOnlyError() |
My concern is that the mode
argument has a somewhat ambiguous status within Zarr. It is not supported by all stores, but it could be. That inconsistency is what opened the possibility for this bug in the first place.
Having written all that, I think I have convinced myself that I am fine with this fix. 🙃
The two follow-up actions I would suggest are:
- Do an audit on the status of
mode
in Zarr. All stores could potentially support mode in the same way that FSStore does. However, we also support a different, redundant method of specifying read-only: via theArray
andGroup
constructors.DirectoryStore
, for example, doesn't havemode
at all, but it is still possible to use read-only stores on disk. Check outopen_array
for more details. - Consider refactoring
dask.array.to_zarr
. Now that Zarr implements url parsing internally via fsspec, it should be possible to leave the store creation completely up to Zarr, rather than Dask.
@@ -429,6 +429,18 @@ def test_create_in_dict(zarr_version, at_root): | |||
assert isinstance(a.store, expected_store_type) | |||
|
|||
|
|||
@pytest.mark.skipif(have_fsspec is False, reason="needs fsspec") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is fsspec needed to reproduce this bug? In my reading of the code, it it should only affect stores that have a mode
argument in their constructor. That is:
FSStore
ZipStore
DBMStore
The fact that not all stores accept the mode
keyword it, to me, a Zarr code smell.
@jrbourbeau - if you can push a release notes update, I will merge this. |
Thanks for the thorough thoughts @rabernat . Indeed, storage code logic has been slowly drifting into zarr, which seems like the right direction to go in. |
As discussed in zarr-developers/zarr-python#1309 (review), Zarr can now handle a the creation of more types of store object from URLs thanks to the FSStore class. This means that usually Dask can just pass through the store URL with no modifications (e.g. calling get_mapper). The only exception is when the user specifies storage_options explicitly
Just realizing that 2.13.4 is likely broken since #1304 is included but this is not! |
Thanks @rabernat @joshmoore! |
* Ensure zarr.create uses writeable mode * Update release.rst Added release notes for [#1309](zarr-developers/zarr-python#1309) * Switch to bug fix Co-authored-by: Josh Moore <[email protected]> Co-authored-by: Sanket Verma <[email protected]>
Closes #1306
cc @ravwojdyla @djhoese
TODO:
Add docstrings and API docs for any new/modified user-facing classes and functionsNew/modified features documented in docs/tutorial.rst