Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support returning versionId on s3 write #282

Closed
hannahilea opened this issue Jun 1, 2023 · 4 comments
Closed

Support returning versionId on s3 write #282

hannahilea opened this issue Jun 1, 2023 · 4 comments
Assignees

Comments

@hannahilea
Copy link
Contributor

When writing objects to s3, we regularly want to capture the stored object's versionId.

From
https://github.com/JuliaCloud/AWS.jl/blob/a7dae3daa06923754dad420a00f1b0a76ae3ddd2/src/services/s3.jl#L5607
this information is returned on s3_put:

If you enable versioning for a bucket, Amazon S3 automatically generates a unique version ID for the object being stored. Amazon S3 returns this ID in the response.

How could we access that information on write? A few options:

  • A. Update default write behavior: have write return a versioned path instead of current return value (which is the parsed returned buffer stream, which seems to default to an empty string??)
  • B. Add option to current write function to toggle return value to be versioned path instead of default
  • C. Update default write behavior to additionally return either the versioned path or the full response header
  • D. Add separate write function that returns a versioned path

Option D is nice in that it doesn't change any default behavior, but A (or C) would be nicer for just getting response versions "for free" on all write operations, which seems like it could be widely desirable (for definitions of "widely" that are maybe just "my team", hah).

While it's possible that folks depend on the current output of write, there are currently no tests in this testset for output (type or content). 😢


In any case, a stub of this requested output would be to swap out

return parse(S3.put_object(bucket, path, args; aws_config=aws, kwargs...))

with something more like

r = S3.parse(S3.put_object(bucket, path, args; aws_config=aws, kwargs...))
version = get(Dict(r.headers), "x-amz-version-id", nothing)
return S3Path(path; version)

(as implemented by @ericphanson in a non-public repo)
or

r = S3.put_object(path.bucket, path.key, params; aws_config)
version = get(Dict(r.headers), "x-amz-version-id", nothing)
return parse(r), r.headers
@hannahilea
Copy link
Contributor Author

Personal preference for option A (or a bastardized version thereof): have put_object return just the version string OR the versioned path

@hannahilea
Copy link
Contributor Author

hannahilea commented Jun 5, 2023

In some private code, my current (hopefully short-term...) workaround is to pirate with
(EDIT: updated since previous workaround did not work!)

@service S3 use_response_type = true

function AWSS3.s3_put(
    aws::AbstractAWSConfig,
    bucket,
    path,
    data::Union{String,Vector{UInt8}},
    data_type="",
    encoding="";
    acl::AbstractString="",
    metadata::AWSS3.SSDict=AWSS3.SSDict(),
    tags::AbstractDict=AWSS3.SSDict(),
    kwargs...,
)
    headers = Dict{String,Any}(["x-amz-meta-$k" => v for (k, v) in metadata])

    if isempty(data_type)
        data_type = "application/octet-stream"
        ext = splitext(path)[2]
        for (e, t) in [
            (".html", "text/html"),
            (".js", "application/javascript"),
            (".pdf", "application/pdf"),
            (".csv", "text/csv"),
            (".txt", "text/plain"),
            (".log", "text/plain"),
            (".dat", "application/octet-stream"),
            (".gz", "application/octet-stream"),
            (".bz2", "application/octet-stream"),
        ]
            if ext == e
                data_type = t
                break
            end
        end
    end

    headers["Content-Type"] = data_type

    if !isempty(tags)
        headers["x-amz-tagging"] = AWS.URIs.escapeuri(tags)
    end

    if !isempty(acl)
        headers["x-amz-acl"] = acl
    end

    if !isempty(encoding)
        headers["Content-Encoding"] = encoding
    end

    args = Dict("body" => data, "headers" => headers)
    r = S3.put_object(bucket, path, args; aws_config=aws, kwargs...)
    version = get(Dict(r.headers), "x-amz-version-id", nothing)
    return S3Path(bucket, path; version)
end

function Base.open(f::Function, path::S3Path, args...; kwargs...)
    buffer = open(path, args...; kwargs...)
    try
        f(buffer)
    finally
        versioned_path = if iswritable(buffer)
            seekstart(buffer)
            write(buffer.path, read(buffer.io))
        else
            nothing
        end
        close(buffer.io)
        return versioned_path
    end
end

@hannahilea
Copy link
Contributor Author

sketchy notes from convo w/ @omus, @kleinschmidt, @a-cakir:

@hannahilea will take first pass at drafting changes.

  • raw flag -> into s3_put (return_raw)
  • inside write call, pass in raw flag “get versioned s3 path” out -> return a new versioned s3 path (return_path flag)

COULD but not going to (now) also add same flag to base.open; curt has grand vision for multipart buffer writing that would be nicer, that we will not do now but should file for future. :)
If we also wanted to add an S3Buffer (which we do, just not quite yet), would look something like:

@hannahilea
Copy link
Contributor Author

Closing, as the original issue is now addressed. We can open further issues as needed for supporting an S3Buffer etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant