feat: add `job_retry` argument to `load_table_from_uri` #969

tswast · 2021-09-14T14:25:10Z

In internal issue 195911158, a customer is struggling to retry jobs that fail with "403 Exceeded rate limits: too many table update operations for this table". One can encounter this exception by attempting to run hundreds of load jobs in parallel.

Thoughts:

Try to reproduce. Does the exception happen at result() or load_table_from_uri()? If result(), continue with job_retry, otherwise see if we can modify the default retry predicate for load_table_from_uri() to find this rate limiting reason and retry.
Assuming the exception does happen at result(), modify load jobs (or more likely the base class) to retry if job retry is set, similar to what we do for query jobs.

Notes:

I suspect we'll need a different default job_retry object for load_table_from_uri(), as the retryable reasons will likely be different than what we have for queries.
I don't think the other load_table_from_* are as retryable as load_table_from_uri(), since they would require rewinding file objects, which isn't always possible. We'll probably want to consider adding job_retry to those load job methods in the future, but for now load_table_from_uri is what's needed.

The text was updated successfully, but these errors were encountered:

tswast · 2021-09-14T14:28:44Z

Here's a stacktrace from a Googler who tried to reproduce this on their own project.

---------------------------------------------------------------------------

RemoteTraceback                           Traceback (most recent call last)

RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "<ipython-input-75-e46c7b68e71a>", line 12, in load_data
    job = load_job.result()  # Waits for the job to complete.
  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/job/base.py", line 679, in result
    return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/google/api_core/future/polling.py", line 134, in result
    raise self._exception
google.api_core.exceptions.Forbidden: 403 Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas
"""


The above exception was the direct cause of the following exception:


Forbidden                                 Traceback (most recent call last)

<ipython-input-77-bef363ce70e2> in <module>
      1 with multiprocessing.Pool() as pool:
----> 2     pool.map(load_data, args)


/opt/conda/lib/python3.7/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    266         in a list that is returned.
    267         '''
--> 268         return self._map_async(func, iterable, mapstar, chunksize).get()
    269 
    270     def starmap(self, func, iterable, chunksize=None):


/opt/conda/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
    655             return self._value
    656         else:
--> 657             raise self._value
    658 
    659     def _set(self, i, obj):


Forbidden: 403 Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas

Indeed the exception does throw from result(). It might be nice to see the structured error data to help with our retry predicate though.

urwa · 2022-06-06T08:51:37Z

Having this exact problem in a cloud function triggered when data is uploaded to cloud bucket. Having job_retry argument to load_table_from_uri will definitely be very useful.

Right now, considering cloud function retry option but I plan to add monitoring on top of cloud function and want to keep logs clean for that even if retry was successful.

So now implementing exponential backoff in case of exception.

tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Sep 14, 2021

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Sep 14, 2021

tswast mentioned this issue Jun 22, 2022

doc: share design document for query retry logic #1123

Merged

4 tasks

Linchin mentioned this issue Feb 2, 2024

Revisit the method load_table_from_json() #1646

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `job_retry` argument to `load_table_from_uri` #969

feat: add `job_retry` argument to `load_table_from_uri` #969

tswast commented Sep 14, 2021

tswast commented Sep 14, 2021

urwa commented Jun 6, 2022

feat: add job_retry argument to load_table_from_uri #969

feat: add job_retry argument to load_table_from_uri #969

Comments

tswast commented Sep 14, 2021

tswast commented Sep 14, 2021

urwa commented Jun 6, 2022

feat: add `job_retry` argument to `load_table_from_uri` #969

feat: add `job_retry` argument to `load_table_from_uri` #969