Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky OutOfMemoryError in trino-hive tests #14248

Open
kokosing opened this issue Sep 22, 2022 · 10 comments · Fixed by #15471
Open

Flaky OutOfMemoryError in trino-hive tests #14248

kokosing opened this issue Sep 22, 2022 · 10 comments · Fixed by #15471
Labels
bug Something isn't working test

Comments

@kokosing
Copy link
Member

2022-09-21T17:44:19.511-0500	INFO	dispatcher-query-130	io.trino.event.QueryMonitor	TIMELINE: Query 20220921_224419_04861_mcxms :: FINISHED :: elapsed 2ms :: planning 0ms :: waiting 0ms :: scheduling 2ms :: running 0ms :: finishing 2ms :: begin 2022-09-21T17:44:19.508-05:00 :: end 2022-09-21T17:44:19.510-05:00
2022-09-21T17:44:19.518-0500	INFO	dispatcher-query-279	io.trino.event.QueryMonitor	TIMELINE: Query 20220921_224419_04862_mcxms :: FINISHED :: elapsed 4ms :: planning 0ms :: waiting 0ms :: scheduling 4ms :: running 0ms :: finishing 4ms :: begin 2022-09-21T17:44:19.513-05:00 :: end 2022-09-21T17:44:19.517-05:00
2022-09-21T17:44:19.518-0500	INFO	pool-3-thread-1	io.trino.testng.services.ProgressLoggingListener	[TEST SUCCESS] io.trino.plugin.hive.TestHiveConnectorTest.testMismatchedBucketing; (took: 2.3 seconds)
Error:  java.lang.OutOfMemoryError: Java heap space
Error:  Terminating due to java.lang.OutOfMemoryError: Java heap space
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
Warning:  Corrupted STDOUT by directly writing to native stream in forked JVM 1. See FAQ web page and the dump file /home/runner/work/trino/trino/plugin/trino-hive/target/surefire-reports/2022-09-21T22-37-11_874-jvmRun1.dumpstream
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  08:06 min
[INFO] Finished at: 2022-09-21T22:45:14Z
[INFO] ------------------------------------------------------------------------
Error:  Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) on project trino-hive: There are test failures.
Error:  
Error:  Please refer to /home/runner/work/trino/trino/plugin/trino-hive/target/surefire-reports for the individual test results.
Error:  Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
Error:  The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
Error:  Command was /bin/sh -c cd /home/runner/work/trino/trino/plugin/trino-hive && /opt/hostedtoolcache/Java_Zulu_jdk/17.0.4-8/x64/bin/java -Duser.timezone=America/Bahia_Banderas -Duser.language=en -Duser.region=US -Dfile.encoding=UTF-8 -Xmx3g -Xms3g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -jar /home/runner/work/trino/trino/plugin/trino-hive/target/surefire/surefirebooter9501654866746885061.jar /home/runner/work/trino/trino/plugin/trino-hive/target/surefire 2022-09-21T22-37-11_874-jvmRun1 surefire5685078[9106](https://github.com/trinodb/trino/actions/runs/3101526176/jobs/5022994159#step:8:9107)93083020tmp surefire_010594696820239026109tmp
Error:  Error occurred in starting fork, check output in log
Error:  Process Exit Code: 3
Error:  Crashed tests:
Error:  io.trino.plugin.hive.TestHiveConnectorTest
Error:  io.trino.plugin.hive.benchmark.TestHiveFileFormatBenchmark
Error:  org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
Error:  Command was /bin/sh -c cd /home/runner/work/trino/trino/plugin/trino-hive && /opt/hostedtoolcache/Java_Zulu_jdk/17.0.4-8/x64/bin/java -Duser.timezone=America/Bahia_Banderas -Duser.language=en -Duser.region=US -Dfile.encoding=UTF-8 -Xmx3g -Xms3g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -jar /home/runner/work/trino/trino/plugin/trino-hive/target/surefire/surefirebooter9501654866746885061.jar /home/runner/work/trino/trino/plugin/trino-hive/target/surefire 2022-09-21T22-37-11_874-jvmRun1 surefire5685078910693083020tmp surefire_010594696820239026109tmp
Error:  Error occurred in starting fork, check output in log
Error:  Process Exit Code: 3
Error:  Crashed tests:
Error:  io.trino.plugin.hive.TestHiveConnectorTest
Error:  io.trino.plugin.hive.benchmark.TestHiveFileFormatBenchmark
@findepi
Copy link
Member

findepi commented Nov 9, 2022

https://github.com/trinodb/trino/actions/runs/3413229175/jobs/5680320052

2022-11-07T22:19:00.2005817Z 2022-11-07T16:19:00.197-0600	INFO	pool-3-thread-2	io.trino.testng.services.ProgressLoggingListener	[TEST SUCCESS] io.trino.plugin.hive.TestHiveConnectorTest.testMismatchedBucketing; (took: 4.2 seconds)
2022-11-07T22:19:00.2025380Z 2022-11-07T16:19:00.199-0600	INFO	pool-3-thread-2	io.trino.testng.services.ProgressLoggingListener	[TEST START] io.trino.plugin.hive.TestHiveConnectorTest.testMultipleWritersWhenTaskScaleWritersIsEnabled
2022-11-07T22:19:53.3523418Z [ERROR] java.lang.OutOfMemoryError: Java heap space
2022-11-07T22:20:09.9412064Z [ERROR] Terminating due to java.lang.OutOfMemoryError: Java heap space
2022-11-07T22:20:10.4866063Z [INFO] 
2022-11-07T22:20:10.4867229Z [INFO] Results:
2022-11-07T22:20:10.4867875Z [INFO] 
2022-11-07T22:20:10.4872553Z [INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
2022-11-07T22:20:10.4872879Z [INFO] 
2022-11-07T22:20:10.4891152Z [WARNING] Corrupted STDOUT by directly writing to native stream in forked JVM 1. See FAQ web page and the dump file /home/runner/work/trino/trino/plugin/trino-hive/target/surefire-reports/2022-11-07T21-58-41_345-jvmRun1.dumpstream
2022-11-07T22:20:10.5065507Z [INFO] ------------------------------------------------------------------------
2022-11-07T22:20:10.5070353Z [INFO] BUILD FAILURE
2022-11-07T22:20:10.5070916Z [INFO] ------------------------------------------------------------------------
2022-11-07T22:20:10.5088723Z [INFO] Total time:  21:35 min
2022-11-07T22:20:10.5089264Z [INFO] Finished at: 2022-11-07T22:20:10Z
2022-11-07T22:20:10.5089747Z [INFO] ------------------------------------------------------------------------
2022-11-07T22:20:10.5131306Z [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) on project trino-hive: There are test failures.
2022-11-07T22:20:10.5131794Z [ERROR] 
2022-11-07T22:20:10.5132596Z [ERROR] Please refer to /home/runner/work/trino/trino/plugin/trino-hive/target/surefire-reports for the individual test results.
2022-11-07T22:20:10.5133317Z [ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
2022-11-07T22:20:10.5133872Z [ERROR] The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
2022-11-07T22:20:10.5136510Z [ERROR] Command was /bin/sh -c cd /home/runner/work/trino/trino/plugin/trino-hive && /opt/hostedtoolcache/Java_Zulu_jdk/17.0.5-8/x64/bin/java -Duser.timezone=America/Bahia_Banderas -Duser.language=en -Duser.region=US -Dfile.encoding=UTF-8 -Xmx3g -Xms3g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -jar /home/runner/work/trino/trino/plugin/trino-hive/target/surefire/surefirebooter7512793082474777651.jar /home/runner/work/trino/trino/plugin/trino-hive/target/surefire 2022-11-07T21-58-41_345-jvmRun1 surefire4615576579227982688tmp surefire_14472171599933126456tmp
2022-11-07T22:20:10.5137774Z [ERROR] Error occurred in starting fork, check output in log
2022-11-07T22:20:10.5138100Z [ERROR] Process Exit Code: 3
2022-11-07T22:20:10.5138369Z [ERROR] Crashed tests:
2022-11-07T22:20:10.5138767Z [ERROR] io.trino.plugin.hive.TestHiveConnectorTest
2022-11-07T22:20:10.5139316Z [ERROR] io.trino.plugin.hive.benchmark.TestHiveFileFormatBenchmark
2022-11-07T22:20:10.5140069Z [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
2022-11-07T22:20:10.5142413Z [ERROR] Command was /bin/sh -c cd /home/runner/work/trino/trino/plugin/trino-hive && /opt/hostedtoolcache/Java_Zulu_jdk/17.0.5-8/x64/bin/java -Duser.timezone=America/Bahia_Banderas -Duser.language=en -Duser.region=US -Dfile.encoding=UTF-8 -Xmx3g -Xms3g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -jar /home/runner/work/trino/trino/plugin/trino-hive/target/surefire/surefirebooter7512793082474777651.jar /home/runner/work/trino/trino/plugin/trino-hive/target/surefire 2022-11-07T21-58-41_345-jvmRun1 surefire4615576579227982688tmp surefire_14472171599933126456tmp
2022-11-07T22:20:10.5143480Z [ERROR] Error occurred in starting fork, check output in log
2022-11-07T22:20:10.5143802Z [ERROR] Process Exit Code: 3
2022-11-07T22:20:10.5144170Z [ERROR] Crashed tests:
2022-11-07T22:20:10.5146076Z [ERROR] io.trino.plugin.hive.TestHiveConnectorTest
2022-11-07T22:20:10.5146585Z [ERROR] io.trino.plugin.hive.benchmark.TestHiveFileFormatBenchmark

@findepi
Copy link
Member

findepi commented Nov 16, 2022

see [TEST START] io.trino.plugin.hive.TestHiveConnectorTest.testMultipleWritersWhenTaskScaleWritersIsEnabled in the above log transcript

this is a test added in #13111
cc @gaurav8297 @sopel39

@findepi
Copy link
Member

findepi commented Nov 28, 2022

#15165 is merged now. Let's see whether is helps anything.

#15181 is also merged, so hopefully next time tests OOM we will have some data to work with.

@ebyhr
Copy link
Member

ebyhr commented Dec 5, 2022

testing/trino-faulttolerant-tests, test-fault-tolerant-hive-2 failed with OutOfMemoryError.
https://github.com/trinodb/trino/actions/runs/3616948145/jobs/6095378984

@findepi
Copy link
Member

findepi commented Dec 5, 2022

@ebyhr this is a different test module, so I moved this into a separate issue: #15293

@findepi
Copy link
Member

findepi commented Dec 7, 2022

#15165 is merged now. Let's see whether is helps anything.

No new reports since then. Let me optimistically close this. Let's reopen if problem reoccurs.

@findepi findepi closed this as completed Dec 7, 2022
@ebyhr
Copy link
Member

ebyhr commented Dec 20, 2022

Let me reopen as plugin/trino-hive failed once again due to OOM.
https://github.com/trinodb/trino/actions/runs/3737995577/jobs/6343962682

Error:  Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) on project trino-hive: There are test failures.
Error:  
Error:  Please refer to /home/runner/work/trino/trino/plugin/trino-hive/target/surefire-reports for the individual test results.
Error:  Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
Error:  The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
Error:  Command was /bin/sh -c cd /home/runner/work/trino/trino/plugin/trino-hive && /opt/hostedtoolcache/Java_Zulu_jdk/17.0.5-8/x64/bin/java -Duser.timezone=America/Bahia_Banderas -Duser.language=en -Duser.region=US -Dfile.encoding=UTF-8 -Xmx3g -Xms3g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -jar /home/runner/work/trino/trino/plugin/trino-hive/target/surefire/surefirebooter14991027088616711536.jar /home/runner/work/trino/trino/plugin/trino-hive/target/surefire 2022-12-20T06-59-59_753-jvmRun1 surefire7705094419763057625tmp surefire_18523579192431898025tmp
Error:  Error occurred in starting fork, check output in log
Error:  Process Exit Code: 3
Error:  Crashed tests:
Error:  io.trino.plugin.hive.TestHiveConnectorTest
Error:  io.trino.plugin.hive.TestOrcPageSourceMemoryTracking
Error:  org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
Error:  Command was /bin/sh -c cd /home/runner/work/trino/trino/plugin/trino-hive && /opt/hostedtoolcache/Java_Zulu_jdk/17.0.5-8/x64/bin/java -Duser.timezone=America/Bahia_Banderas -Duser.language=en -Duser.region=US -Dfile.encoding=UTF-8 -Xmx3g -Xms3g -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:-OmitStackTraceInFastThrow -jar /home/runner/work/trino/trino/plugin/trino-hive/target/surefire/surefirebooter14991027088616711536.jar /home/runner/work/trino/trino/plugin/trino-hive/target/surefire 2022-12-20T06-59-59_753-jvmRun1 surefire7705094419763057625tmp surefire_18523579192431898025tmp
Error:  Error occurred in starting fork, check output in log
Error:  Process Exit Code: 3
Error:  Crashed tests:
Error:  io.trino.plugin.hive.TestHiveConnectorTest
Error:  io.trino.plugin.hive.TestOrcPageSourceMemoryTracking
Error:  	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:748)
Error:  	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:305)
Error:  	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:265)
Error:  	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1314)
Error:  	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1159)
Error:  	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:932)
Error:  	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
Error:  	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210)
Error:  	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156)
Error:  	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148)
Error:  	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
Error:  	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
Error:  	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
Error:  	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
Error:  	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
Error:  	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
Error:  	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
Error:  	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:972)
Error:  	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:293)
Error:  	at org.apache.maven.cli.MavenCli.main(MavenCli.java:196)
Error:  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Error:  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
Error:  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Error:  	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
Error:  	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282)
Error:  	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225)
Error:  	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
Error:  	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)
Error:  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Error:  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
Error:  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Error:  	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
Error:  	at org.apache.maven.wrapper.BootstrapMainStarter.start(BootstrapMainStarter.java:47)
Error:  	at org.apache.maven.wrapper.WrapperExecutor.execute(WrapperExecutor.java:156)
Error:  	at org.apache.maven.wrapper.MavenWrapperMain.main(MavenWrapperMain.java:72)
Error:  -> [Help 1]

@findepi
Copy link
Member

findepi commented Dec 20, 2022

Let me reopen as plugin/trino-hive failed once again due to OOM.
https://github.com/trinodb/trino/actions/runs/3737995577/jobs/6343962682

I looked into that heap dump and tried to understand what's going on, helped by @skrzypo987 and @losipiuk too.

The heap dump contains thread dump and from the thread dump it can be seen that TestHiveConnectorTest.testMultipleWritersWhenTaskScaleWritersIsEnabled and some TestOrcPageSourceMemoryTracking method where being executed concurrently. The former can be run with 200m heap (or lower) and the latter needs at most 500m, so they should both fit into 3g heap we have for the tests.

We noticed there is a large number of byte[] of size bigger than half of default G1 region size (actually bigger then the region size), which probably may cause a lot of heap to go wasted.
I don't know whether changing region size will help, but I propose we give it a try -- #15471

@pajaks
Copy link
Member

pajaks commented Jan 3, 2023

@losipiuk losipiuk reopened this Jan 3, 2023
@losipiuk
Copy link
Member

losipiuk commented Jan 3, 2023

reopoened (you can do it yourself next time :) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working test
Development

Successfully merging a pull request may close this issue.

5 participants