Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BigQuery, Storage]: resumable uploads fail with non-retryable GoogleAPICallError: 410 PUT Service Temporarily Unavailable #7530

Closed
bencaine1 opened this issue Mar 18, 2019 · 10 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. api: storage Issues related to the Cloud Storage API. backend external This issue is blocked on a bug with the actual product. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@bencaine1
Copy link

OS: Linux dc32b7e8763a 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 x86_64 x86_64 GNU/Linux
Python version: Python 2.7.6
google-cloud-bigquery: 1.8.0

We're getting a new flake we've never seen before. Furthermore, the load_table_from_file function doesn't take any retry parameters, so we can't patch this on our end unless we want to implement our own retry.

Code sample:

job = self.gclient.load_table_from_file(output, table.reference,
                                        job_config=job_config,
                                        rewind=True)

Stack trace:

  File "/usr/local/lib/python2.7/dist-packages/verily/bigquery_wrapper/bq.py", line 467, in populate_table
    rewind=True)
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigquery/client.py", line 1034, in load_table_from_file
    raise exceptions.from_http_response(exc.response)
GoogleAPICallError: 410 PUT https://www.googleapis.com/upload/bigquery/v2/projects/packard-campbell-int-testing/jobs?uploadType=resumable&upload_id=AEnB2UqNXWJ2JosgkMwAFBbBrCuHEl4S8F354ozSfIgwqKFsfn2hqADnUNynvWYge-gWk53mpTa6-7xMlDDS60XH2oJeHFWprA: Service Temporarily Unavailable
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Mar 19, 2019
@tseaver tseaver added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. api: bigquery Issues related to the BigQuery API. backend priority: p2 Moderately-important priority. Fix may not be included in next release. and removed triage me I really want to be triaged. labels Mar 19, 2019
@tseaver
Copy link
Contributor

tseaver commented Mar 19, 2019

Wow, that is definitely the wrong error response:

  • 410 means GONE, as in "never coming back.
    The error should be503, ServiceUnavailable`.

@tswast Can you check with the back-end team for the source of this error response?

@tswast
Copy link
Contributor

tswast commented Mar 19, 2019

Yikes. Filed bug 128935544 internally.

@sduskis
Copy link
Contributor

sduskis commented Jun 13, 2019

This is a service issue. I don't think we should keep this issue open.

@sduskis sduskis closed this as completed Jun 13, 2019
@tswast
Copy link
Contributor

tswast commented Aug 23, 2019

There is discussion on internal bug 115694647 as well as public issue https://issuetracker.google.com/137168102 This issue affects all services.

There are cases where one of the API calls for a resumable upload get stuck and the upload cannot be continued. The client would need to start the upload from the beginning when such a failure happens, but currently does not due to retries being handled at the request level. Some manual logic would be needed to solve this. We might even need to generate a new job ID, though. Not sure how far BigQuery gets into job creation if the upload fails.

@tswast tswast reopened this Aug 23, 2019
@tswast tswast added the api: storage Issues related to the Cloud Storage API. label Aug 23, 2019
@tswast tswast changed the title BigQuery: load_table_from_file gives flaky GoogleAPICallError: 410 PUT Service Temporarily Unavailable [BigQuery, Storage]: resumable uploads fail with non-retryable GoogleAPICallError: 410 PUT Service Temporarily Unavailable Aug 23, 2019
@tswast tswast removed their assignment Aug 23, 2019
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Sep 14, 2019
@tseaver
Copy link
Contributor

tseaver commented Sep 16, 2019

@tswast I'm not sure how we are supposed to deal with the 410 Gone response from within a client library: that is the exact opposite of a transient error code: it is supposed to mean that the resource is utterly destroyed / never existed.

The "gsutil Retry Handling Strategy" doc does not include 410 as one of the transient failures for which automatic retry is appropriate.

@tseaver tseaver added external This issue is blocked on a bug with the actual product. and removed 🚨 This issue needs some love. labels Sep 16, 2019
@tswast
Copy link
Contributor

tswast commented Sep 25, 2019

410 is correct, actually. There is an internal "upload" resource that has had an irrecoverable failure. The only way to recover is by starting the upload flow with a fresh request from the very beginning of the file / stream.

@tswast
Copy link
Contributor

tswast commented Sep 25, 2019

Not sure how far BigQuery gets into job creation if the upload fails.

Supposedly, BigQuery doesn't create the job, so we should be able to use the same job ID.

@tseaver
Copy link
Contributor

tseaver commented Sep 26, 2019

@tswast

410 is correct, actually.

Not according to RFC 2616, which states:

The requested resource is no longer available at the server and no forwarding address is known. This condition is expected to be considered permanent.

The target for the PUT request is https://www.googleapis.com/upload/bigquery/v2/projects/packard-campbell-int-testing/jobs, which is not "gone" at all: we expect an identical subsequent request to the same resource to succeed.

That the server crapped out somehow while trying to process the PUT request means that it should be a 50x response: there is not valid 40x response which matches that semantic (40x implies that there is something defective with the client's request).

One could surely argue that the API should require a POST request here (because the URL itself doesn't identify the resource uniquely, without the query string), but neither method should expect to retry any 40x response code.

@tswast
Copy link
Contributor

tswast commented Sep 26, 2019

The point I'm trying to make is that we can't retry the request identically. Resumable uploads are a multi-request operation. This failure occurs partway through an upload. To retry the upload, we need to seek back to the beginning of the file and start the whole process over.

@crwilcox
Copy link
Contributor

Closing in favor of a fresh feature request. Thanks everyone for the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. api: storage Issues related to the Cloud Storage API. backend external This issue is blocked on a bug with the actual product. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

6 participants