Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure zarr.create uses writeable mode #1309

Merged
merged 4 commits into from
Jan 16, 2023

Conversation

jrbourbeau
Copy link
Member

@jrbourbeau jrbourbeau commented Dec 23, 2022

Closes #1306

cc @ravwojdyla @djhoese

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Dec 23, 2022
@jrbourbeau
Copy link
Member Author

jrbourbeau commented Dec 23, 2022

Can confirm the changes here also fix the related test failures we were seeing in Dask's CI (xref dask/dask#9736)

@jrbourbeau jrbourbeau self-assigned this Dec 23, 2022
@codecov
Copy link

codecov bot commented Dec 23, 2022

Codecov Report

Merging #1309 (381111b) into main (4e633ad) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main     #1309   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           35        35           
  Lines        14148     14157    +9     
=========================================
+ Hits         14148     14157    +9     
Impacted Files Coverage Δ
zarr/creation.py 100.00% <100.00%> (ø)
zarr/tests/test_creation.py 100.00% <100.00%> (ø)

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @jrbourbeau for a quick fix of this regression.

I'm eager to help fix the problem. At the same time, I want to understand this bug a bit better in order to avoid creating additional technical debt within zarr-python.

When I first saw @djhoese's issue, I was puzzled by the recommend fix. The example was using local file storage. How is it possible that creating arrays in local files is completely broken in zarr? Surely our test suite would have caught such a regression.

I dug deeper and realized that dask.array.to_zarr is always converting string arguments to FSMap objects (see https://github.com/dask/dask/blob/40bc376acb1f631aa80db17ce4fb779e1a888aee/dask/array/core.py#L3677-L3681) using code like

    if isinstance(url, str):
        mapper = get_mapper(url, **storage_options)

FSMap has been effectively deprecated by the introduction of zarr.storage.FSStore (beginning with #546). FSStore has additional performance optimization that are not available with FSMap, and therefore we always prefer to use an FSStore (although an FSMap is still a valid store, as it is a valid MutableMapping).

To address this inconsistency, #1304 recently implemented a change to automatically promote an FSMap to an FSStore (thanks @ravwojdyla! 🙌 ). That change is what revealed this latent bug. Before, the FSMap was initialized with whatever mode the user chose (or the default from fsspec, presumably mode='w'). Now we override this with the default keyword argument from normalize_store_arg which is indeed mode='r':

zarr-python/zarr/storage.py

Lines 144 to 147 in 4e633ad

if isinstance(store, fsspec.FSMap):
return FSStore(store.root,
fs=store.fs,
mode=mode,

mode is not an attribute of an FSMap, but it is an attribute of FSStore. It is used internally within Zarr to implement read-only behavior, e.g.

zarr-python/zarr/storage.py

Lines 1409 to 1411 in 4e633ad

def __delitem__(self, key):
if self.mode == 'r':
raise ReadOnlyError()

My concern is that the mode argument has a somewhat ambiguous status within Zarr. It is not supported by all stores, but it could be. That inconsistency is what opened the possibility for this bug in the first place.

Having written all that, I think I have convinced myself that I am fine with this fix. 🙃

The two follow-up actions I would suggest are:

  • Do an audit on the status of mode in Zarr. All stores could potentially support mode in the same way that FSStore does. However, we also support a different, redundant method of specifying read-only: via the Array and Group constructors. DirectoryStore, for example, doesn't have mode at all, but it is still possible to use read-only stores on disk. Check out open_array for more details.
  • Consider refactoring dask.array.to_zarr. Now that Zarr implements url parsing internally via fsspec, it should be possible to leave the store creation completely up to Zarr, rather than Dask.

@@ -429,6 +429,18 @@ def test_create_in_dict(zarr_version, at_root):
assert isinstance(a.store, expected_store_type)


@pytest.mark.skipif(have_fsspec is False, reason="needs fsspec")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is fsspec needed to reproduce this bug? In my reading of the code, it it should only affect stores that have a mode argument in their constructor. That is:

  • FSStore
  • ZipStore
  • DBMStore

The fact that not all stores accept the mode keyword it, to me, a Zarr code smell.

@rabernat
Copy link
Contributor

@jrbourbeau - if you can push a release notes update, I will merge this.

@martindurant
Copy link
Member

Thanks for the thorough thoughts @rabernat . Indeed, storage code logic has been slowly drifting into zarr, which seems like the right direction to go in.

mrocklin pushed a commit to dask/dask that referenced this pull request Dec 30, 2022
As discussed in zarr-developers/zarr-python#1309 (review), Zarr can now handle a the creation of more types of store object from URLs thanks to the FSStore class. This means that usually Dask can just pass through the store URL with no modifications (e.g. calling get_mapper). The only exception is when the user specifies storage_options explicitly
@joshmoore
Copy link
Member

Just realizing that 2.13.4 is likely broken since #1304 is included but this is not!

@joshmoore joshmoore mentioned this pull request Jan 16, 2023
5 tasks
@MSanKeys963 MSanKeys963 mentioned this pull request Jan 16, 2023
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jan 16, 2023
@joshmoore joshmoore merged commit 1fd607a into zarr-developers:main Jan 16, 2023
@jrbourbeau jrbourbeau deleted the read-only-fixup branch January 17, 2023 18:22
@jrbourbeau
Copy link
Member Author

Thanks @rabernat @joshmoore!

enthusiastdev121 added a commit to enthusiastdev121/zarr-python that referenced this pull request Aug 19, 2024
* Ensure zarr.create uses writeable mode

* Update release.rst

Added release notes for [#1309](zarr-developers/zarr-python#1309)

* Switch to bug fix

Co-authored-by: Josh Moore <[email protected]>
Co-authored-by: Sanket Verma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dask to_zarr now complains about read-only after #1304 merged
6 participants