You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Support for read/writeback caching and transactions, with strong atomicity, isolation, consistency, and durability (ACID) guarantees.
and this sentence in the Blog:
Safety of parallel operations when many machines are accessing the same dataset is achieved through the use of optimistic concurrency, which maintains compatibility with diverse underlying storage layers (including Cloud storage platforms, such as GCS, as well as local filesystems) without significantly impacting performance. TensorStore also provides strong ACID guarantees for all individual operations executing within a single runtime.
I created a dummy dataset with the zarr + S3 drivers:
So from the perspective of an observer (who may eventually want to load this dataset again), the operation does not appear to be transactional. So when the blog says transactional with a single runtime, do you mean that the process's view of ds when the context manager exits is transactional, but otherwise make no guarantees about the state of the underlying storage?
If one sets
withts.Transaction(atomic=True) astxn:
...
then if a write would span multiple chunks, I see an error
ValueError: Cannot read/write "ts/yang-test-dataset/.zarray" and read/write "ts/yang-test-dataset/0.0.0" as single atomic transaction [source locations='tensorstore/internal/cache/kvs_backed_cache.h:221\ntensorstore/internal/cache/async_cache.cc:660\ntensorstore/internal/cache/async_cache.h:383\ntensorstore/internal/cache/chunk_cache.cc:438\ntensorstore/internal/grid_partition.cc:246\ntensorstore/internal/grid_partition.cc:246\ntensorstore/internal/grid_partition.cc:246']
I'm guessing this is expected since you have no way of performing a transactional write across multiple S3 objects?
Lastly, on the topic of "optimistic concurrency and compatibility with GCS/other storage layers", since AFAIK S3 does not support conditional PUTs the way that GCS does, is there a possibility of data loss when using S3?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
The S3 support was added recently but we indeed need to clarify the limitations in the documentation.
S3 lacks conditional write support and it is indeed possible with multiple concurrent writes to the same object that some writes will be lost.
There is a strategy for implementing atomic writes on S3 under certain assumptions on the timestamps, but it would require a list operation in order to read, which may be costly. When using this strategy with ocdbt, only a single list operation would be needed for the manifest, and subsequent reads (using the cached manifest) would be normal read operations, and multi-key atomic transactions could also be supported (currently a small amount of work remains to actually support both s3 and multi-key atomic operations with ocdbt).
I have a general question in regard to:
and this sentence in the Blog:
I created a dummy dataset with the zarr + S3 drivers:
and then created a situation where the next write to chunk 0.0.3 would fail. Running under a transaction
would throw
but the S3 bucket after this operation looks like this:
So from the perspective of an observer (who may eventually want to load this dataset again), the operation does not appear to be transactional. So when the blog says transactional with a single runtime, do you mean that the process's view of
ds
when the context manager exits is transactional, but otherwise make no guarantees about the state of the underlying storage?If one sets
then if a write would span multiple chunks, I see an error
I'm guessing this is expected since you have no way of performing a transactional write across multiple S3 objects?
Lastly, on the topic of "optimistic concurrency and compatibility with GCS/other storage layers", since AFAIK S3 does not support conditional PUTs the way that GCS does, is there a possibility of data loss when using S3?
Thanks in advance!
The text was updated successfully, but these errors were encountered: