Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Use the same mysql transaction to handle multiple database operations #3112

Open
2 tasks done
rfyu opened this issue Aug 19, 2024 · 1 comment · May be fixed by #3114
Open
2 tasks done

[Bug]: Use the same mysql transaction to handle multiple database operations #3112

rfyu opened this issue Aug 19, 2024 · 1 comment · May be fixed by #3114
Labels
type:bug Something isn't working

Comments

@rfyu
Copy link
Contributor

rfyu commented Aug 19, 2024

What happened?

Although multiple SQLs can be submitted at one time through NestedSqlSession, they are still divided into multiple transactions on the mysql side. Therefore, if there are multiple SQL operating on the same row of data in a batch of SQL, mutual blocking problems may occur.

Affects Versions

master

What table formats are you seeing the problem on?

Iceberg

What engines are you seeing the problem on?

AMS

How to reproduce

No response

Relevant log output

2024-08-09 21:39:56,155 ERROR [thrift-server-optimize-manager-184] [com.netease.arctic.server.optimizing.OptimizingQueue] [] - accept result error:
com.netease.arctic.server.exception.PersistenceException: org.apache.ibatis.exceptions.PersistenceException:
### Error updating database.  Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
### The error may exist in com/netease/arctic/server/persistence/mapper/OptimizingMapper.java (best guess)
### The error may involve com.netease.arctic.server.persistence.mapper.OptimizingMapper.updateTaskRuntime-Inline
### The error occurred while setting parameters
### SQL: UPDATE task_runtime SET retry_num = ?, start_time = ?, end_time = ?, cost_time = ?, status = ?, fail_reason = ?, optimizer_token = ?, thread_id = ?, rewrite_output = ?, metrics_summary = ?, properties = ? WHERE process_id = ? AND task_id = ?
### Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
        at com.netease.arctic.server.exception.ArcticRuntimeException.wrap(ArcticRuntimeException.java:141) ~[amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at com.netease.arctic.server.persistence.PersistentBase.doAs(PersistentBase.java:70) ~[amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at com.netease.arctic.server.optimizing.TaskRuntime.persistTaskRuntime(TaskRuntime.java:315) ~[amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at com.netease.arctic.server.optimizing.TaskRuntime.lambda$reset$1(TaskRuntime.java:123) ~[amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:1.8.0_121]
        at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) ~[?:1.8.0_121]
        at com.netease.arctic.server.persistence.PersistentBase.doAsTransaction(PersistentBase.java:78) ~[amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at com.netease.arctic.server.persistence.StatedPersistentBase.invokeConsisitency(StatedPersistentBase.java:52) ~[amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at com.netease.arctic.server.optimizing.TaskRuntime.reset(TaskRuntime.java:111) ~[amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at com.netease.arctic.server.optimizing.OptimizingQueue.retryTask(OptimizingQueue.java:294) ~[amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at com.netease.arctic.server.optimizing.OptimizingQueue$TableOptimizingProcess.acceptResult(OptimizingQueue.java:438) ~[amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at com.netease.arctic.server.optimizing.TaskRuntime.lambda$complete$0(TaskRuntime.java:103) ~[amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) [?:1.8.0_121]
        at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) [?:1.8.0_121]
        at com.netease.arctic.server.persistence.PersistentBase.doAsTransaction(PersistentBase.java:78) [amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at com.netease.arctic.server.persistence.StatedPersistentBase.invokeConsisitency(StatedPersistentBase.java:52) [amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at com.netease.arctic.server.optimizing.TaskRuntime.complete(TaskRuntime.java:76) [amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at com.netease.arctic.server.DefaultOptimizingService.completeTask(DefaultOptimizingService.java:232) [amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at sun.reflect.GeneratedMethodAccessor143.invoke(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
        at com.netease.arctic.server.utils.ThriftServiceProxy.invoke(ThriftServiceProxy.java:56) [amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at com.sun.proxy.$Proxy59.completeTask(Unknown Source) [?:?]
        at com.netease.arctic.ams.api.OptimizingService$Processor$completeTask.getResult(OptimizingService.java:583) [amoro-ams-api-0.7.0-SNAPSHOT.jar:?]
        at com.netease.arctic.ams.api.OptimizingService$Processor$completeTask.getResult(OptimizingService.java:562) [amoro-ams-api-0.7.0-SNAPSHOT.jar:?]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) [libthrift-0.13.0.jar:0.13.0]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) [libthrift-0.13.0.jar:0.13.0]
        at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:138) [libthrift-0.13.0.jar:0.13.0]
        at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524) [libthrift-0.13.0.jar:0.13.0]
        at org.apache.thrift.server.Invocation.run(Invocation.java:18) [libthrift-0.13.0.jar:0.13.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
Caused by: org.apache.ibatis.exceptions.PersistenceException:
### Error updating database.  Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
### The error may exist in com/netease/arctic/server/persistence/mapper/OptimizingMapper.java (best guess)
### The error may involve com.netease.arctic.server.persistence.mapper.OptimizingMapper.updateTaskRuntime-Inline
### The error occurred while setting parameters
### SQL: UPDATE task_runtime SET retry_num = ?, start_time = ?, end_time = ?, cost_time = ?, status = ?, fail_reason = ?, optimizer_token = ?, thread_id = ?, rewrite_output = ?, metrics_summary = ?, properties = ? WHERE process_id = ? AND task_id = ?
### Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
        at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30) ~[mybatis-3.5.6.jar:3.5.6]
        at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:199) ~[mybatis-3.5.6.jar:3.5.6]
        at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:67) ~[mybatis-3.5.6.jar:3.5.6]
        at org.apache.ibatis.binding.MapperProxy$PlainMethodInvoker.invoke(MapperProxy.java:152) ~[mybatis-3.5.6.jar:3.5.6]
        at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:85) ~[mybatis-3.5.6.jar:3.5.6]
        at com.sun.proxy.$Proxy55.updateTaskRuntime(Unknown Source) ~[?:?]
        at com.netease.arctic.server.optimizing.TaskRuntime.lambda$persistTaskRuntime$5(TaskRuntime.java:315) ~[amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at com.netease.arctic.server.persistence.PersistentBase.doAs(PersistentBase.java:66) ~[amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        ... 31 more
Caused by: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
        at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:123) ~[mysql-connector-java-8.0.30.jar:8.0.30]
        at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) ~[mysql-connector-java-8.0.30.jar:8.0.30]
        at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:916) ~[mysql-connector-java-8.0.30.jar:8.0.30]
        at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:354) ~[mysql-connector-java-8.0.30.jar:8.0.30]
        at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94) ~[commons-dbcp2-2.9.0.jar:2.9.0]
        at org.apache.commons.dbcp2.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:94) ~[commons-dbcp2-2.9.0.jar:2.9.0]
        at sun.reflect.GeneratedMethodAccessor95.invoke(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
        at org.apache.ibatis.logging.jdbc.PreparedStatementLogger.invoke(PreparedStatementLogger.java:59) ~[mybatis-3.5.6.jar:3.5.6]
        at com.sun.proxy.$Proxy34.execute(Unknown Source) ~[?:?]
        at org.apache.ibatis.executor.statement.PreparedStatementHandler.update(PreparedStatementHandler.java:47) ~[mybatis-3.5.6.jar:3.5.6]
        at org.apache.ibatis.executor.statement.RoutingStatementHandler.update(RoutingStatementHandler.java:74) ~[mybatis-3.5.6.jar:3.5.6]
        at org.apache.ibatis.executor.SimpleExecutor.doUpdate(SimpleExecutor.java:50) ~[mybatis-3.5.6.jar:3.5.6]
        at org.apache.ibatis.executor.BaseExecutor.update(BaseExecutor.java:117) ~[mybatis-3.5.6.jar:3.5.6]
        at org.apache.ibatis.executor.CachingExecutor.update(CachingExecutor.java:76) ~[mybatis-3.5.6.jar:3.5.6]
        at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:197) ~[mybatis-3.5.6.jar:3.5.6]
        at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:67) ~[mybatis-3.5.6.jar:3.5.6]
        at org.apache.ibatis.binding.MapperProxy$PlainMethodInvoker.invoke(MapperProxy.java:152) ~[mybatis-3.5.6.jar:3.5.6]
        at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:85) ~[mybatis-3.5.6.jar:3.5.6]
        at com.sun.proxy.$Proxy55.updateTaskRuntime(Unknown Source) ~[?:?]
        at com.netease.arctic.server.optimizing.TaskRuntime.lambda$persistTaskRuntime$5(TaskRuntime.java:315) ~[amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        at com.netease.arctic.server.persistence.PersistentBase.doAs(PersistentBase.java:66) ~[amoro-ams-server-0.7.0-SNAPSHOT.jar:?]
        ... 31 more

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

  • I agree to follow this project's Code of Conduct
@rfyu rfyu added the type:bug Something isn't working label Aug 19, 2024
@zhoujinsong
Copy link
Contributor

I am not sure what the reason is for waiting for the task_runtime table block.

A possible answer is we are trying to delete expired task_runtimes at that time with SQL:

delete from task_runtime where table_id =and process_id < ?

Maybe we can check this next time the error occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants