Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION][BigQuery] would it be possible to have extract_table_to_storage support with an optional query #2361

Closed
vrcs opened this issue Sep 20, 2016 · 3 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue.

Comments

@vrcs
Copy link

vrcs commented Sep 20, 2016

Hi

Would be great to have a query as an optional field to the extract_table_to_storage method in the google.cloud.bigquery.client.py , so that it will be easy with code to extract the data from the desired table with the custom query and write to GS, or is there any other way to directly write the custom query data after creating the job

import uuid 
from gcloud import bigquery
client = bigquery.Client(project='test-sample')
query = """\
SELECT * FROM test_database.activity \
where activity_date = DATE_ADD(CURRENT_DATE(), -1, 'DAY') order by activity_date;"""
dataset = client.dataset('test_database')
job_name = 'test_job_name_' +  str(uuid.uuid4())
query_job = client.run_async_query('fullname-age-query-job', query)
job = bigquery_client.extract_table_to_storage(
    client.run_async_query(job_name, query), destination)

or is there any other way for this work around, i have some tables as partitioned tables and some with out partition tables, and would need to export some times the data from partitioned and the non-partitioned tables for a given date for analysis with other tools.

Thanks

@daspecster daspecster added type: question Request for information or clarification. Not an issue. api: bigquery Issues related to the BigQuery API. labels Sep 20, 2016
@tseaver
Copy link
Contributor

tseaver commented Sep 20, 2016

@vrcs The back-end extract operation doesn't support such an option. To achieve this goal, you could:

  • Run an asynchronous QueryJob, passing a temporary table as its destination.
  • Run an ExtractJob on that temporary table.

Untested sketch:

import time
from google.cloud.bigquery import Client

client = Client()
dataset = client('my_dataset')
temp_table = dataset.table('query-results')
assert not temp_table.exists()
query = 'SELECT * from my_dataset.widgets WHERE frob_count >= 100')

query_job = client.run_async_query('query-frobbed-widgets', query)
query_job.destination = temp_table
query_job.begin()

while query_job.state.lower() != 'done':
    query.job.reload()
    time.sleep(1)

extract_job = client.extract_table_to_storage(
    'extract-query-results', temp_table,
    'gs://my-bucket-name/frobbed-widgets*')
extract_job.destination_format = 'CSV'
extract_job.begin()

while extract_job.state.lower() != 'done':
    extract.job.reload()
    time.sleep(1)

Please feel free to re-open the issue if that example doesn't help.

@tseaver tseaver closed this as completed Sep 20, 2016
@vrcs
Copy link
Author

vrcs commented Sep 21, 2016

Got it working , need to make use of legacy sql to true to parse to get yesterdays date with
this expression because of quotes in the exp .DATE_ADD(CURRENT_DATE(), -1, 'DAY')

Thanks you very much @tseaver

@tseaver
Copy link
Contributor

tseaver commented Sep 21, 2016

@vrcs I'm glad you're up and running. We are working toward full support for the "Standard SQL" dialect: #2229, #2354. Please add another issue if those don't cover your case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

3 participants