Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GR-41680] High memory consumption with debug info generation enabled #5294

Closed
lewurm opened this issue Oct 25, 2022 · 11 comments
Closed

[GR-41680] High memory consumption with debug info generation enabled #5294

lewurm opened this issue Oct 25, 2022 · 11 comments

Comments

@lewurm
Copy link
Member

lewurm commented Oct 25, 2022

Describe the issue
We can't build graalpy with a heap of 16G when debug info is enabled.

Steps to reproduce the issue
Please include both build steps as well as run steps

$ git clone [email protected]:oracle/graal.git
$ git clone [email protected]:oracle/graalpython.git
$ cd graal/vm
$ mx --components=pyn,pynl --dy graalpython,/substratevm --debuginfo-dists build

Describe GraalVM and your environment:

More details
Without -g max RSS is "only" about 10G. Here is a stack trace with the versions specified above:

Example No. 1
[libpythonvm:2617831]  Version info: 'GraalVM 23.0.0-dev Java 17 CE'
[libpythonvm:2617831]  Java version info: '17.0.4+7-jvmci-22.3-b02'
[libpythonvm:2617831]  C compiler: gcc (linux, aarch64, 9.4.0)
[libpythonvm:2617831]  Garbage collector: Serial GC
[libpythonvm:2617831]  6 user-specific feature(s)
[libpythonvm:2617831]  - com.oracle.graal.python.BouncyCastleFeature
[libpythonvm:2617831]  - com.oracle.svm.thirdparty.jline.JLine3Feature
[libpythonvm:2617831]  - com.oracle.svm.truffle.TruffleBaseFeature: Provides base support for Truffle
[libpythonvm:2617831]  - com.oracle.svm.truffle.TruffleFeature: Enables compilation of Truffle ASTs to machine code
[libpythonvm:2617831]  - org.graalvm.home.HomeFinderFeature: Finds GraalVM paths and its version number
[libpythonvm:2617831]  - org.graalvm.launcher.JLine3FeatureModuleAccessFeature
[libpythonvm:2617831] [2/7] Performing analysis...  [*********]                                        (147.4s @ 6.11GB)
[libpythonvm:2617831]   39,514 (97.52%) of 40,519 classes reachable
[libpythonvm:2617831]   60,786 (65.81%) of 92,369 fields reachable
[libpythonvm:2617831]  184,467 (73.64%) of 250,508 methods reachable
[libpythonvm:2617831]   28,010 (11.18%) of 250,508 methods included for runtime compilation
[libpythonvm:2617831]    1,494 classes,   120 fields, and 3,217 methods registered for reflection
[libpythonvm:2617831]       74 classes,   119 fields, and    96 methods registered for JNI access
[libpythonvm:2617831]        4 native libraries: dl, pthread, rt, z
[libpythonvm:2617831] [3/7] Building universe...                                                         (8.1s @ 4.84GB)
[libpythonvm:2617831] [4/7] Parsing methods...      [********]                                          (75.7s @ 4.53GB)
[libpythonvm:2617831] [5/7] Inlining methods...     [****]                                               (6.2s @ 5.69GB)
[libpythonvm:2617831] [6/7] Compiling methods...    [********]                                          (75.0s @ 6.92GB)
[libpythonvm:2617831] [7/7] Creating image...
                                                            (0.0s @ 9.69GB)
[libpythonvm:2617831] GC warning: 277.2s spent in 132 GCs during the last stage, taking up 74.24% of the time.
[libpythonvm:2617831]             Please ensure more than 13.52GB of memory is available for Native Image
[libpythonvm:2617831]             to reduce GC overhead and improve image build time.
Fatal error: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.base/java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:88)
        at java.base/java.lang.StringBuilder.<init>(StringBuilder.java:106)
        at com.oracle.objectfile.debugentry.Range.getExtendedMethodName(Range.java:238)
        at com.oracle.objectfile.debugentry.Range.constructClassAndMethodNameWithParams(Range.java:273)
        at com.oracle.objectfile.debugentry.Range.<init>(Range.java:156)
        at com.oracle.objectfile.debugentry.DebugInfoBase.addSubrange(DebugInfoBase.java:436)
        at com.oracle.objectfile.debugentry.DebugInfoBase.lambda$installDebugInfo$4(DebugInfoBase.java:300)
        at com.oracle.objectfile.debugentry.DebugInfoBase$$Lambda$3404/0x00000001025ffd70.accept(Unknown Source)
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)
        at com.oracle.objectfile.debugentry.DebugInfoBase.lambda$installDebugInfo$5(DebugInfoBase.java:300)
        at com.oracle.objectfile.debugentry.DebugInfoBase$$Lambda$3401/0x00000001025ff668.accept(Unknown Source)
        at com.oracle.svm.hosted.image.NativeImageDebugInfoProvider$NativeImageDebugCodeInfo.debugContext(NativeImageDebugInfoProvider.java:1215)
        at com.oracle.objectfile.debugentry.DebugInfoBase.lambda$installDebugInfo$6(DebugInfoBase.java:277)
        at com.oracle.objectfile.debugentry.DebugInfoBase$$Lambda$3400/0x00000001025ff430.accept(Unknown Source)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at com.oracle.objectfile.debugentry.DebugInfoBase.installDebugInfo(DebugInfoBase.java:277)
        at com.oracle.objectfile.elf.ELFObjectFile.installDebugInfo(ELFObjectFile.java:1216)
        at com.oracle.svm.hosted.image.NativeImageDebugInfoFeature.beforeImageWrite(NativeImageDebugInfoFeature.java:101)
        at com.oracle.svm.hosted.NativeImageGenerator.lambda$doRun$7(NativeImageGenerator.java:708)
        at com.oracle.svm.hosted.NativeImageGenerator$$Lambda$3369/0x00000001025f6c70.accept(Unknown Source)
        at com.oracle.svm.hosted.FeatureHandler.forEachFeature(FeatureHandler.java:85)
        at com.oracle.svm.hosted.NativeImageGenerator.doRun(NativeImageGenerator.java:708)
        at com.oracle.svm.hosted.NativeImageGenerator.run(NativeImageGenerator.java:535)

Here is an example that happened on graal-enterprise:

Example No. 2
[libpythonvm:158072] [7/7] Creating image...                                                            (0.0s @ 10.87GB)
Fatal error: com.oracle.svm.core.util.VMError$HostedError: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at com.oracle.svm.core.util.VMError.shouldNotReachHere(VMError.java:72)
	at com.oracle.svm.hosted.NativeImageGenerator.doRun(NativeImageGenerator.java:695)
	at com.oracle.svm.hosted.NativeImageGenerator.run(NativeImageGenerator.java:535)
	at com.oracle.svm.hosted.NativeImageGeneratorRunner.buildImage(NativeImageGeneratorRunner.java:403)
	at com.oracle.svm.hosted.NativeImageGeneratorRunner.build(NativeImageGeneratorRunner.java:580)
	at com.oracle.svm.hosted.NativeImageGeneratorRunner.main(NativeImageGeneratorRunner.java:128)
	at com.oracle.svm.hosted.NativeImageGeneratorRunner$JDK9Plus.main(NativeImageGeneratorRunner.java:610)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at jdk.internal.vm.ci/jdk.vm.ci.hotspot.CompilerToVM.getLineNumberTable(Native Method)
	at jdk.internal.vm.ci/jdk.vm.ci.hotspot.CompilerToVM.getLineNumberTable(CompilerToVM.java:651)
	at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotResolvedJavaMethodImpl.getLineNumberTable(HotSpotResolvedJavaMethodImpl.java:602)
	at com.oracle.graal.pointsto.meta.AnalysisMethod.getLineNumberTable(AnalysisMethod.java:601)
	at com.oracle.svm.hosted.meta.HostedMethod.getLineNumberTable(HostedMethod.java:415)
	at com.oracle.svm.hosted.image.NativeImageDebugInfoProvider$NativeImageDebugBaseMethodInfo.<init>(NativeImageDebugInfoProvider.java:933)
	at com.oracle.svm.hosted.image.NativeImageDebugInfoProvider$NativeImageDebugLocationInfo.<init>(NativeImageDebugInfoProvider.java:1694)
	at com.oracle.svm.hosted.image.NativeImageDebugInfoProvider$NativeImageDebugLocationInfo.<init>(NativeImageDebugInfoProvider.java:1690)
	at com.oracle.svm.hosted.image.NativeImageDebugInfoProvider$NativeImageDebugCodeInfo.createLeafLocationInfo(NativeImageDebugInfoProvider.java:1459)
	at com.oracle.svm.hosted.image.NativeImageDebugInfoProvider$NativeImageDebugCodeInfo$SingleLevelVisitor.process(NativeImageDebugInfoProvider.java:1347)
	at com.oracle.svm.hosted.image.NativeImageDebugInfoProvider$NativeImageDebugCodeInfo$TopLevelVisitor.apply(NativeImageDebugInfoProvider.java:1364)
	at com.oracle.svm.core.code.CompilationResultFrameTree$CallNode.visitChildren(CompilationResultFrameTree.java:404)
	at com.oracle.svm.hosted.image.NativeImageDebugInfoProvider$NativeImageDebugCodeInfo.locationInfoProvider(NativeImageDebugInfoProvider.java:1249)
	at com.oracle.objectfile.debugentry.DebugInfoBase.lambda$installDebugInfo$5(DebugInfoBase.java:300)
	at com.oracle.objectfile.debugentry.DebugInfoBase$$Lambda$4026/0x0000000102a08fb8.accept(Unknown Source)
	at com.oracle.svm.hosted.image.NativeImageDebugInfoProvider$NativeImageDebugCodeInfo.debugContext(NativeImageDebugInfoProvider.java:1215)
	at com.oracle.objectfile.debugentry.DebugInfoBase.lambda$installDebugInfo$6(DebugInfoBase.java:277)
	at com.oracle.objectfile.debugentry.DebugInfoBase$$Lambda$4025/0x0000000102a08d80.accept(Unknown Source)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
	at com.oracle.objectfile.debugentry.DebugInfoBase.installDebugInfo(DebugInfoBase.java:277)
	at com.oracle.objectfile.elf.ELFObjectFile.installDebugInfo(ELFObjectFile.java:1216)
	at com.oracle.svm.hosted.image.NativeImage.build(NativeImage.java:476)
	at com.oracle.svm.hosted.NativeImageGenerator.doRun(NativeImageGenerator.java:683)
	at com.oracle.svm.hosted.NativeImageGenerator.run(NativeImageGenerator.java:535)
[libpythonvm:158072] ------------------------------------------------------------------------------------------------------------------------
[libpythonvm:158072]                       188.2s (14.3% of total time) in 372 GCs | Peak RSS: 16.03GB | CPU load: 6.87
[libpythonvm:158072] ========================================================================================================================
[libpythonvm:158072] Failed generating 'libpythonvm' after 21m 49s.

This one is on graal-enterprise too, but instead of running out of time the watch dog shuts down. Not 100% sure if related, but it feels like it:

Example No. 3
=== Image generator watchdog detected no activity. This can be a sign of a deadlock during image building. Dumping all stack traces. Current time: Wed Oct 12 18:40:51 UTC 2022
"main" Id=1 in RUNNABLE
    at [email protected]/jdk.vm.ci.hotspot.CompilerToVM.lookupType(Native Method)
    at [email protected]/jdk.vm.ci.hotspot.CompilerToVM.lookupType(CompilerToVM.java:247)
    at [email protected]/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.lookupTypeInternal(HotSpotJVMCIRuntime.java:854)
    at [email protected]/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.lookupType(HotSpotJVMCIRuntime.java:841)
    at [email protected]/jdk.vm.ci.hotspot.HotSpotSignature.getParameterType(HotSpotSignature.java:166)
    at app//com.oracle.graal.pointsto.infrastructure.WrappedSignature.getParameterType(WrappedSignature.java:52)
    at app//com.oracle.graal.pointsto.infrastructure.WrappedSignature.getParameterType(WrappedSignature.java:52)
    at app//com.oracle.svm.hosted.image.NativeImageDebugInfoProvider.createParamInfo(NativeImageDebugInfoProvider.java:1134)
    at app//com.oracle.svm.hosted.image.NativeImageDebugInfoProvider$NativeImageDebugBaseMethodInfo.<init>(NativeImageDebugInfoProvider.java:935)
    at app//com.oracle.svm.hosted.image.NativeImageDebugInfoProvider$NativeImageDebugHostedMethodInfo.<init>(NativeImageDebugInfoProvider.java:1165)
    at app//com.oracle.svm.hosted.image.NativeImageDebugInfoProvider$NativeImageDebugCodeInfo.<init>(NativeImageDebugInfoProvider.java:1207)
    at app//com.oracle.svm.hosted.image.NativeImageDebugInfoProvider.lambda$codeInfoProvider$1(NativeImageDebugInfoProvider.java:227)
    at app//com.oracle.svm.hosted.image.NativeImageDebugInfoProvider$$Lambda$3902/0x0000000802da9c40.apply(Unknown Source)
    at [email protected]/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
    at [email protected]/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
    at [email protected]/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at [email protected]/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at [email protected]/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
    at [email protected]/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
    at [email protected]/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at [email protected]/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
    at app//com.oracle.objectfile.debugentry.DebugInfoBase.installDebugInfo(DebugInfoBase.java:277)
    at app//com.oracle.objectfile.elf.ELFObjectFile.installDebugInfo(ELFObjectFile.java:1216)
    at app//com.oracle.svm.hosted.image.NativeImage.build(NativeImage.java:476)
    at app//com.oracle.svm.hosted.NativeImageGenerator.doRun(NativeImageGenerator.java:683)
    at app//com.oracle.svm.hosted.NativeImageGenerator.run(NativeImageGenerator.java:535)
    at app//com.oracle.svm.hosted.NativeImageGeneratorRunner.buildImage(NativeImageGeneratorRunner.java:403)
    at app//com.oracle.svm.hosted.NativeImageGeneratorRunner.build(NativeImageGeneratorRunner.java:580)
    at app//com.oracle.svm.hosted.NativeImageGeneratorRunner.main(NativeImageGeneratorRunner.java:128)
    at app//com.oracle.svm.hosted.NativeImageGeneratorRunner$JDK9Plus.main(NativeImageGeneratorRunner.java:610)
@lewurm
Copy link
Member Author

lewurm commented Oct 25, 2022

@adinn anything obvious to you that could help to reduce memory consumption?

@olpaw
Copy link
Member

olpaw commented Oct 25, 2022

On aarch64 even for EE, CE debuginfo generation is used. As of now this code is not much optimized to use as little memory as possible. Obviously @adinn would be the best person to look into this.

@olpaw olpaw changed the title High memory consumption with debug info generation enabled [GR-41680] High memory consumption with debug info generation enabled Oct 25, 2022
@adinn
Copy link
Collaborator

adinn commented Nov 3, 2022

@olpaw @lewurm I do know that debug info generation uses a lot of memory. I am not at all clear how much room there is for cutting it down to size. However, as @olpaw says, there has been no real effort so far to optimize this code for space so there is a good chance we can make some immediate savings that are significant (I am bound to have made some dumb design choices :-).

I'll start by measuring the scale of the problem with a few different target apps and try to identify where there is room for savings and how to proceed.

@adinn
Copy link
Collaborator

adinn commented Nov 16, 2022

@lewurm I have a PR prepared (#5436) that makes some immediate improvements to debug info footprint. I have not yet run your test to try it out but I am hopeful it will make a big difference. I'll let you know the results.

@adinn
Copy link
Collaborator

adinn commented Nov 16, 2022

@lewurm I had problems following your instructions to build graalpy on x86_64

[adinn@sputstik vm]$ mx --components=pyn,pynl --dy graalpython,graal/substratevm --debuginfo-dists build
Downloading LLVM_ORG_SRC from ['https://lafo.ssw.uni-linz.ac.at/pub/llvm-org/llvm-src-llvmorg-14.0.6-3-gc7a4a53c32-bgc5e298fd27.tar.gz']
99196728 bytes (100%)
WARNING: No platform-specific definition is available for project com.oracle.truffle.llvm.tests.pthread.native for your architecture (amd64)
 File "/home/adinn/redhat/openjdk/graal/graalpython/mx.graalpython/suite.py", line 142 in definition of NETBEANS-LIB-PROFILER:
Library without "path" attribute must have a non-empty "urls" list attribute or "maven" attribute

Do I have to build it on AArch64? Or is this some problem with the build?

@lewurm
Copy link
Member Author

lewurm commented Nov 16, 2022

Sorry for that @adinn. Can you try git revert b22e8f179ff649b76a94af30881efd7d4340f492 in the graalpython repository? That should unblock you.

@adinn
Copy link
Collaborator

adinn commented Nov 16, 2022

@lewurm No problem. We now appear to be cooking with gas.

@lewurm
Copy link
Member Author

lewurm commented Nov 16, 2022

@lewurm I have a PR prepared (#5436) that makes some immediate improvements to debug info footprint. I have not yet run your test to try it out but I am hopeful it will make a big difference. I'll let you know the results.

Thank you for looking into it! I can confirm that your PR reduces the memory consumption significantly. Numbers:

[libpythonvm:1145800]  126.64MB (22.68%) for code area:   151,213 compilation units
[libpythonvm:1145800]  244.96MB (43.86%) for image heap:2,262,376 objects and 4,341 resources
[libpythonvm:1145800]  176.52MB (31.61%) for debug info generated in 55.8s
[libpythonvm:1145800]   10.33MB ( 1.85%) for other data
[libpythonvm:1145800]  558.45MB in total
[...]
[libpythonvm:1145800]                       65.4s (14.8% of total time) in 205 GCs | Peak RSS: 13.22GB | CPU load: 3.74

While previously it would run out of memory (16gb max heap).

@adinn
Copy link
Collaborator

adinn commented Nov 16, 2022

@lewurm Thanks for testing it. I was about to report the same outcome on x86_64. Before and after runs both completed. However, the memory stats for the compile phase show a very noticeable drop in the amount of data being generated.

Before

[libpythonvm:298043] [7/7] Creating image...                    (50.3s @ 8.16GB)

After

[libpythonvm:318202] [7/7] Creating image...                    (55.7s @ 6.37GB)

@lewurm
Copy link
Member Author

lewurm commented Nov 23, 2022

@adinn should we close this ticket? Or do you want to keep it open until you have explored other options as mentioned e.g. in #5436 (comment)

@adinn
Copy link
Collaborator

adinn commented Nov 23, 2022

@lewurm I'm happy to close this and raise another issue to cover further changes, some of which are currently in progress (#5475) and some of which are still to be investigated.

@lewurm lewurm closed this as completed Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants