Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(RFC): Object Versioning #2602

Merged
merged 8 commits into from
Jul 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 138 additions & 0 deletions core/src/docs/rfcs/2602_object_versioning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
- Proposal Name: object_versioning
- Start Date: 2023-07-06
- RFC PR: [apache/incubator-opendal#2602](https://github.com/apache/incubator-opendal/pull/2602)
- Tracking Issue: [apache/incubator-opendal#2611](https://github.com/apache/incubator-opendal/issues/2611)

# Summary

This proposal describes the object versioning (or object version control) feature of OpenDAL.

# Motivation

There is a kind of storage service, which is called object storage service,
provides a simple and scalable way to store, organize, and access unstructured data.
These services store data as objects within buckets.
And an object is a file and any metadata that describes that file, a bucket is a container for objects.

The object versioning provided by these services is a very useful feature.
It allows users to keep multiple versions of an object in the same bucket.
If users enable object versioning, each object will have a history of versions.
Each version will have a unique version ID, which is a string that is unique for each version of an object.

(The object, bucket,
and version ID mentioned here are all concepts of object storage services,
they could be called differently in different services,
but they are the same thing.)

OpenDAL provides support for some of those services, such as S3, GCS, Azure Blob Storage, etc.
Now we want to add support for object versioning to OpenDAL.

# Guide-level explanation

When object versioning is enabled, the following operations will be supported:

- `stat`: Get the metadata of an object with specific version ID.
- `read`: Read a specific version of an object.
- `delete`: Delete a specific version of an object.

Code example:

```rust
// To get the current version ID of a file
let meta = op.stat("path/to/file").await?;
let version_id = meta.version().expect("just for example");

// To fetch the metadata of specific version of a file
let meta = op.stat_with("path/to/file").version("version_id").await?;
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved
let version_id = meta.version().expect("just for example"); // get the version ID

// To read an file with specific version ID
let content = op.read_with("path/to/file").version("version_id").await?;

// To delete an file with specific version ID
op.delete_with("path/to/file").version("version_id").await?;
```

# Reference-level explanation
Xuanwo marked this conversation as resolved.
Show resolved Hide resolved

Those operations with object version are different from the normal operations:

- `stat`: when getting the metadata of a file, it will always get the metadata of the latest version of the file if no version ID is specified. And there will be a new field `version` in the metadata to indicate the version ID of the file.
- `read`: when reading a file, it will always read the latest version of the file if no version ID is specified.
- `delete`: when deleting a file, it will always delete the latest version of the file if no version ID is specified. And users will not be able to read this file without specifying the version ID, unless they specify a version not be deleted.

And with object versioning, when writing an object,
it will always create a new version of the object than overwrite the old version.
But here it is imperceptible to the user.
Because the version id is generated by the service itself, it cannot be specified by the user and user cannot override the historical version.

To implement object versioning, we will do the following:

- Add a new field `version` to `OpStat`, `OpRead` and `OpDelete` struct.
- Add a new field `version` to `ObjectMetadata` struct.
- Add a new property(setter) `version` to the return value of `stat_with`, `read_with` method.
- Add a new method `delete_with` and add a new property(setter) `version` to the return value of `delete_with` method.

For service backend, it should support the following operations:

- `stat`: Get the metadata of an object with specific version ID.
- `read`: Read a specific version of an object.
- `delete`: Delete a specific version of an object.

# Drawbacks

None.

# Rationale and alternatives

## What is object versioning?

Object versioning is a feature that allows users to keep multiple versions of an object in the same bucket.

It's a way to preserve, retrieve, and restore every version of every object stored in a bucket.

With object versioning, users can easily recover from both unintended user actions and application failures.

## How does object versioning work?

When object versioning is enabled, each object will have a history of versions. Each version will have a unique version ID, which is a string that is unique for each version of an object.

The version ID is not a timestamp.
It is not guaranteed to be sequential.
Many object storage services produce object version IDs by themselves, using their own algorithms.
Users cannot specify the version ID when writing an object.

## Will object versioning affect the existing code?

There is no difference between whether object versioning is enabled or not when writing an object.
The storage service will always create a new version of the object than overwrite the old version when writing an object.
But here it is imperceptible to the user.

## What are the benefits of object versioning?

With object versioning, users can:

- Track the history of a file.
- Implement optimistic concurrency control.
- Implement a simple backup system.

## reference

- [AWS S3 Object Versioning](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html)
- [How does AWS S3 object versioning work?](https://docs.aws.amazon.com/AmazonS3/latest/userguide/versioning-workflows.html)
- [How to enable object versioning for a bucket in AWS S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/manage-versioning-examples.html)
- [Google Cloud Storage Object Versioning](https://cloud.google.com/storage/docs/object-versioning)
- [Azure Blob Storage Object Versioning](https://docs.microsoft.com/en-us/azure/storage/blobs/versioning-overview)

# Prior art

None.

# Unresolved questions

None.

# Future possibilities

Impl a new method `list_versions`(list all versions of an object).

3 changes: 3 additions & 0 deletions core/src/docs/rfcs/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -142,3 +142,6 @@ pub mod rfc_2133_append_api {}

#[doc = include_str!("2299_chain_based_operator_api.md")]
pub mod rfc_2299_chain_based_operator_api {}

#[doc = include_str!("2602_object_versioning.md")]
pub mod rfc_2602_object_versioning {}