Impossible to set non-default storage project since 0.5.0 #832

mwitkow · 2015-04-16T15:37:50Z

The upgrade to 0.5.0 broke the public API without notes in the Release by removing storage.get_connection(project='foo') function.

Moreover, it seems that the Connection class no longer has a project parameter. Neither storage.get_bucket nor storage.lookup_bucket take project as a parameter. Curiously, you can override it in create_bucket and list_buckets.

The only way to set a project for these is through _helpers.set_default_project, which makes it incredibly hard to operate on buckets in multiple projects, since you need to set the default all the time, exposing yourself to race conditions in case of multithreading.

Can we put back the project into Connection? AFAIK the Connection's credentials only make sense for a given project anyway.

The text was updated successfully, but these errors were encountered:

dhermes · 2015-04-16T16:23:15Z

@mwitkow-io Thanks a ton for filing this! It's really great to hear that

We have users
Our users are upgrading
Our users care enough to reach out

Sorry the release notes didn't clarify some of these things. That's on me.

The storage.get_connection method was not removed, it just changed signature. It no longer takes a project and has no arguments: now it is just storage.get_connection(). The previous argument (the project) was removed because a project is no longer bound to a connection (as you mentioned).

We removed project from Connection based on discussion with the API team in #726. A project is only relevant when creating a bucket or listing buckets, so it was overkill to have it bound to a connection since only 2 of the 34 possible API methods used it.

RE: "makes it incredibly hard to operate on buckets in multiple projects", what is the issue you are experiencing? The bucket name (Bucket instance) and the credentials for your connection should be sufficient to operate differing buckets.

mwitkow · 2015-04-16T17:17:14Z

Well, I blame Github issues not being as nice as Buganizer ;)

Where does Connection store the project name? The only thing it has access to in the credentials (e.g. in the json file) is the project id that prefixes the client_id or client_email of the service accounts.

The Bucket also doesn't contain any metadata about the project itself if I can see correctly.

Can you give me an example of how to use the new API against two separate projects (with separate credentials objects) and getting a Bucket handle for each so I can create/delete Blob by name?

dhermes · 2015-04-16T18:40:52Z

RE: "Where does Connection store the project name?" It doesn't. The project is not bound to the connection.

RE: "The Bucket also doesn't contain any metadata about the project itself if I can see correctly." It doesn't need to. The bucket name is the only identifier needed.

Here is an example (we are working on documenting and improving this, see #805 and #830):

from gcloud.credentials import get_for_service_account_json
from gcloud import datastore

# Working on making this part easier / shorter
creds1 = get_for_service_account_json('path/to/key1.json').create_scoped(
    datastore.SCOPE)
conn1 = datastore.Connection(credentials=creds1)

creds2 = get_for_service_account_json('path/to/key2.json').create_scoped(
    datastore.SCOPE)
conn2 = datastore.Connection(credentials=creds2)

bucket1 = storage.get_bucket('bucketname1', connection=conn1)
bucket2 = storage.get_bucket('bucketname2', connection=conn2)

bucket1.delete_blob('blob-name1.txt')
bucket2.delete_blob('blob-name2.txt')

tseaver · 2015-04-16T19:29:16Z

@mwitkow-io bucket names are globally unique (not per-project), which means you don't need to know the project ID except when creating a new bucket, or listing buckets associated with a given project.

mwitkow · 2015-04-17T08:16:51Z

Oh, indeed. I completely forgot that buckets are globally identifiable.

mwitkow · 2015-04-17T08:17:11Z

Please feel free to close :) (I wish I could)

dhermes · 2015-04-17T17:36:32Z

@mwitkow-io I really appreciate the feedback! Please feel free to open more and let us know.

I am particularly interested in the

need to set the default all the time, exposing yourself to race conditions in case of multithreading.

I'd love to hear more about your workload. We have better support for multithreading with our batches than we do for global defaults.

I have a hunch that you'd rather be passing an explicit connection / dataset ID / project in a multithreaded environment, but again I'd love to hear how it's working in the wild.

…/python-docs-samples#832)

tseaver closed this as completed Apr 17, 2015

dhermes added the api: storage Issues related to the Cloud Storage API. label Dec 31, 2015

JustinBeckwith assigned tseaver Feb 1, 2021

parthea pushed a commit that referenced this issue Jul 6, 2023

Updates client library to version 0.23.0 [(#832)](GoogleCloudPlatform…

c69e7e1

…/python-docs-samples#832)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impossible to set non-default storage project since 0.5.0 #832

Impossible to set non-default storage project since 0.5.0 #832

mwitkow commented Apr 16, 2015

dhermes commented Apr 16, 2015

mwitkow commented Apr 16, 2015

dhermes commented Apr 16, 2015

tseaver commented Apr 16, 2015

mwitkow commented Apr 17, 2015

mwitkow commented Apr 17, 2015

dhermes commented Apr 17, 2015

Impossible to set non-default storage project since 0.5.0 #832

Impossible to set non-default storage project since 0.5.0 #832

Comments

mwitkow commented Apr 16, 2015

dhermes commented Apr 16, 2015

mwitkow commented Apr 16, 2015

dhermes commented Apr 16, 2015

tseaver commented Apr 16, 2015

mwitkow commented Apr 17, 2015

mwitkow commented Apr 17, 2015

dhermes commented Apr 17, 2015