Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big query retries during create read session #15013

Conversation

vlad-lyutenko
Copy link
Contributor

@vlad-lyutenko vlad-lyutenko commented Nov 14, 2022

Description

After some code investigation I found out that we have some logic of retries in bigquery connector. But as far as I understand this retry logic is applied only after we successfully create read session and executing read itself (during getting next page) :

@Overridepublic ReadRowsResponse next()
{
    do {
        try {
            ReadRowsResponse response = serverResponses.next();
              ...
        }
        catch (Exception e) {
            // if relevant, retry the read, from the last read position            
            if (BigQueryUtil.isRetryable(e) && retries < helper.maxReadRowsRetries) {
                log.debug("Request failed, retrying: %s", e);
                serverResponses = helper.fetchResponses(nextOffset);
                retries++;
            }
            ...
        }
    }
    while (serverResponses.hasNext());

Similar exception could occur during Read Session creation.
So this PR introduce exact same retry logic for read session creation as we have for read pages.

Maybe we should go with Failsafe here.

Non-technical explanation

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# BigQuery
* Add support for retrying to create a BigQuery read session. ({issue}`15013`)

@cla-bot cla-bot bot added the cla-signed label Nov 14, 2022
@wendigo
Copy link
Contributor

wendigo commented Nov 14, 2022

@vlad-lyutenko let's use Failsafe here and for the other case as well (separate commit will be needed)

@ebyhr
Copy link
Member

ebyhr commented Nov 15, 2022

Similar exception could occur during Read Session creation.

Have you already faced the issue or is it an assumption?

@hashhar
Copy link
Member

hashhar commented Nov 15, 2022

@ebyhr It's seen occasionally on CI but I can't find a recent link - here's a stack trace:

java.lang.AssertionError: Execution of 'actual' query failed: SELECT max(comment) FROM nation
	at org.testng.Assert.fail(Assert.java:83)
	at io.trino.testing.QueryAssertions.assertQuery(QueryAssertions.java:150)
	at io.trino.testing.QueryAssertions.assertQuery(QueryAssertions.java:106)
	at io.trino.testing.AbstractTestQueryFramework.assertQuery(AbstractTestQueryFramework.java:169)
	at io.trino.testing.AbstractTestQueryFramework.assertQuery(AbstractTestQueryFramework.java:164)
	at io.trino.testing.AbstractTestIntegrationSmokeTest.testAggregation(AbstractTestIntegrationSmokeTest.java:63)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
	at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.RuntimeException: io.grpc.StatusRuntimeException: INTERNAL: request failed: internal error
	at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:122)
	at io.trino.testing.DistributedQueryRunner.execute(DistributedQueryRunner.java:505)
	at io.trino.testing.QueryAssertions.assertQuery(QueryAssertions.java:147)
	... 17 more
	Suppressed: java.lang.Exception: SQL: SELECT max(comment) FROM nation
		at io.trino.testing.DistributedQueryRunner.execute(DistributedQueryRunner.java:508)
		... 18 more
Caused by: com.google.api.gax.rpc.InternalException: io.grpc.StatusRuntimeException: INTERNAL: request failed: internal error
	at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:67)
	at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
	at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
	at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
	at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1133)
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1277)
	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1038)
	at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:808)
	at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:563)
	at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:533)
	at io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:463)
	at io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:427)
	at io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:460)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:557)
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:69)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:738)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:717)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
	Suppressed: com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed
		at com.google.api.gax.rpc.ApiExceptions.callAndTranslateApiException(ApiExceptions.java:57)
		at com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112)
		at com.google.cloud.bigquery.storage.v1.BigQueryReadClient.createReadSession(BigQueryReadClient.java:232)
		at io.trino.plugin.bigquery.ReadSessionCreator.create(ReadSessionCreator.java:73)
		at io.trino.plugin.bigquery.BigQuerySplitManager.readFromBigQuery(BigQuerySplitManager.java:116)
		at io.trino.plugin.bigquery.BigQuerySplitManager.getSplits(BigQuerySplitManager.java:98)
		at io.trino.spi.connector.ConnectorSplitManager.getSplits(ConnectorSplitManager.java:37)
		at io.trino.split.SplitManager.getSplits(SplitManager.java:75)
		at io.trino.sql.planner.SplitSourceFactory$Visitor.visitScanAndFilter(SplitSourceFactory.java:183)
		at io.trino.sql.planner.SplitSourceFactory$Visitor.visitTableScan(SplitSourceFactory.java:157)
		at io.trino.sql.planner.SplitSourceFactory$Visitor.visitTableScan(SplitSourceFactory.java:128)
		at io.trino.sql.planner.plan.TableScanNode.accept(TableScanNode.java:222)
		at io.trino.sql.planner.SplitSourceFactory$Visitor.visitAggregation(SplitSourceFactory.java:282)
		at io.trino.sql.planner.SplitSourceFactory$Visitor.visitAggregation(SplitSourceFactory.java:128)
		at io.trino.sql.planner.plan.AggregationNode.accept(AggregationNode.java:203)
		at io.trino.sql.planner.SplitSourceFactory.createSplitSources(SplitSourceFactory.java:108)
		at io.trino.execution.scheduler.SqlQueryScheduler$PipelinedDistributedStagesScheduler.createStageScheduler(SqlQueryScheduler.java:1317)
		at io.trino.execution.scheduler.SqlQueryScheduler$PipelinedDistributedStagesScheduler.create(SqlQueryScheduler.java:1197)
		at io.trino.execution.scheduler.SqlQueryScheduler.createDistributedStagesScheduler(SqlQueryScheduler.java:345)
		at io.trino.execution.scheduler.SqlQueryScheduler.start(SqlQueryScheduler.java:311)
		at io.trino.execution.SqlQueryExecution.start(SqlQueryExecution.java:422)
		at io.trino.execution.SqlQueryManager.createQuery(SqlQueryManager.java:243)
		at io.trino.dispatcher.LocalDispatchQuery.lambda$startExecution$7(LocalDispatchQuery.java:143)
		at io.trino.$gen.Trino_testversion____20220215_094042_6.run(Unknown Source)
		... 3 more
Caused by: io.grpc.StatusRuntimeException: INTERNAL: request failed: internal error
	at io.grpc.Status.asRuntimeException(Status.java:535)
	... 13 more


return readSession;
int retries = 0;
do {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use failsafe

@@ -36,7 +36,8 @@
private static final Set<String> INTERNAL_ERROR_MESSAGES = ImmutableSet.of(
"HTTP/2 error code: INTERNAL_ERROR",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any failsafe policy that is based on this that you can reuse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot find, and looks like I can't move failsafe policy to Utility class, because retry count is fetched from config

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/bigquery-retry-create-read-session branch from 2acf942 to 69f0634 Compare November 15, 2022 12:44
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the commit title as:

Support retrying to create BigQuery read session

"Big query" isn't the correct product name and please follow https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#format-git-commit-messages

@vlad-lyutenko vlad-lyutenko force-pushed the vlad-lyutenko/bigquery-retry-create-read-session branch from 69f0634 to 8054f84 Compare November 15, 2022 14:15
@vlad-lyutenko
Copy link
Contributor Author

Please fix the commit title as:

Support retrying to create BigQuery read session

"Big query" isn't the correct product name and please follow https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#format-git-commit-messages

fixed

@ebyhr ebyhr merged commit 1258019 into trinodb:master Nov 16, 2022
@ebyhr
Copy link
Member

ebyhr commented Nov 16, 2022

Merged, thanks!

@ebyhr ebyhr mentioned this pull request Nov 16, 2022
@github-actions github-actions bot added this to the 404 milestone Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants