Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impossible to set non-default storage project since 0.5.0 #832

Closed
mwitkow opened this issue Apr 16, 2015 · 7 comments
Closed

Impossible to set non-default storage project since 0.5.0 #832

mwitkow opened this issue Apr 16, 2015 · 7 comments
Assignees
Labels
api: storage Issues related to the Cloud Storage API.

Comments

@mwitkow
Copy link

mwitkow commented Apr 16, 2015

The upgrade to 0.5.0 broke the public API without notes in the Release by removing storage.get_connection(project='foo') function.

Moreover, it seems that the Connection class no longer has a project parameter. Neither storage.get_bucket nor storage.lookup_bucket take project as a parameter. Curiously, you can override it in create_bucket and list_buckets.

The only way to set a project for these is through _helpers.set_default_project, which makes it incredibly hard to operate on buckets in multiple projects, since you need to set the default all the time, exposing yourself to race conditions in case of multithreading.

Can we put back the project into Connection? AFAIK the Connection's credentials only make sense for a given project anyway.

@dhermes
Copy link
Contributor

dhermes commented Apr 16, 2015

@mwitkow-io Thanks a ton for filing this! It's really great to hear that

  1. We have users
  2. Our users are upgrading
  3. Our users care enough to reach out

Sorry the release notes didn't clarify some of these things. That's on me.


The storage.get_connection method was not removed, it just changed signature. It no longer takes a project and has no arguments: now it is just storage.get_connection(). The previous argument (the project) was removed because a project is no longer bound to a connection (as you mentioned).


We removed project from Connection based on discussion with the API team in #726. A project is only relevant when creating a bucket or listing buckets, so it was overkill to have it bound to a connection since only 2 of the 34 possible API methods used it.


RE: "makes it incredibly hard to operate on buckets in multiple projects", what is the issue you are experiencing? The bucket name (Bucket instance) and the credentials for your connection should be sufficient to operate differing buckets.

@mwitkow
Copy link
Author

mwitkow commented Apr 16, 2015

Well, I blame Github issues not being as nice as Buganizer ;)

Where does Connection store the project name? The only thing it has access to in the credentials (e.g. in the json file) is the project id that prefixes the client_id or client_email of the service accounts.

The Bucket also doesn't contain any metadata about the project itself if I can see correctly.

Can you give me an example of how to use the new API against two separate projects (with separate credentials objects) and getting a Bucket handle for each so I can create/delete Blob by name?

@dhermes
Copy link
Contributor

dhermes commented Apr 16, 2015

RE: "Where does Connection store the project name?" It doesn't. The project is not bound to the connection.

RE: "The Bucket also doesn't contain any metadata about the project itself if I can see correctly." It doesn't need to. The bucket name is the only identifier needed.


Here is an example (we are working on documenting and improving this, see #805 and #830):

from gcloud.credentials import get_for_service_account_json
from gcloud import datastore

# Working on making this part easier / shorter
creds1 = get_for_service_account_json('path/to/key1.json').create_scoped(
    datastore.SCOPE)
conn1 = datastore.Connection(credentials=creds1)

creds2 = get_for_service_account_json('path/to/key2.json').create_scoped(
    datastore.SCOPE)
conn2 = datastore.Connection(credentials=creds2)

bucket1 = storage.get_bucket('bucketname1', connection=conn1)
bucket2 = storage.get_bucket('bucketname2', connection=conn2)

bucket1.delete_blob('blob-name1.txt')
bucket2.delete_blob('blob-name2.txt')

@tseaver
Copy link
Contributor

tseaver commented Apr 16, 2015

@mwitkow-io bucket names are globally unique (not per-project), which means you don't need to know the project ID except when creating a new bucket, or listing buckets associated with a given project.

@mwitkow
Copy link
Author

mwitkow commented Apr 17, 2015

Oh, indeed. I completely forgot that buckets are globally identifiable.

@mwitkow
Copy link
Author

mwitkow commented Apr 17, 2015

Please feel free to close :) (I wish I could)

@tseaver tseaver closed this as completed Apr 17, 2015
@dhermes
Copy link
Contributor

dhermes commented Apr 17, 2015

@mwitkow-io I really appreciate the feedback! Please feel free to open more and let us know.

I am particularly interested in the

need to set the default all the time, exposing yourself to race conditions in case of multithreading.

I'd love to hear more about your workload. We have better support for multithreading with our batches than we do for global defaults.

I have a hunch that you'd rather be passing an explicit connection / dataset ID / project in a multithreaded environment, but again I'd love to hear how it's working in the wild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API.
Projects
None yet
Development

No branches or pull requests

3 participants