Skip to content
This repository has been archived by the owner on Jul 8, 2022. It is now read-only.

Bump JNA to 5.7.0 #165

Merged
merged 4 commits into from
Feb 12, 2021
Merged

Bump JNA to 5.7.0 #165

merged 4 commits into from
Feb 12, 2021

Conversation

emign
Copy link
Contributor

@emign emign commented Feb 11, 2021

Bump JNA Version to 5.7.0 for Apple Silicon Support
Fixes: korlibs/korge#335

@emign emign requested a review from soywiz February 11, 2021 20:22
@emign
Copy link
Contributor Author

emign commented Feb 11, 2021

Only tested on my machine with Azul aarch64 1.8 JDK

@soywiz
Copy link
Member

soywiz commented Feb 12, 2021

Thanks!

And with this, everything works on Apple Silicon? If that's the case, that's amazing! Tough could make sense since there is only JNA and not JNI at all.

Despite I hope they have tried & they have CI, let me try it in the coming days that everything still works on the typical targets.

@emign
Copy link
Contributor Author

emign commented Feb 12, 2021

Well. It compiles a working JVM target for the aarch64 JDK/JVM from azul.
That fixes the mentioned issue at least :)

What else there is to be discovered: I don't know :)

I wanted to introduce a github action for apple silicon for korlibs but there isn't any at the moment.

@soywiz
Copy link
Member

soywiz commented Feb 12, 2021

For the record, this is the merged PR: java-native-access/jna#1297

BTW Nico, Im just curious, have you tried the sprites10k sample, or the bunnymark one from this repo and have some numbers of CPU usage/fps between using an emulated x64 java runtime vs using the native arm64 version. As far as I know the M1 is lightning fast, and having the GPU and embedded memory on the SoC + 5nm lithography this should be a beast

@emign
Copy link
Contributor Author

emign commented Feb 12, 2021

Sorry I don't have anything objectively or scientifically accurate. The debug window (F7) cannot be opened on macOS 11.2 because it freezes the JVM app.

JVM: 100,000 bunnies: around 42-44% CPU
JVM aarch64 with 100,000 bunnies: around 40-42%

The compile times are SIGNIFICANTLY shorter on a native aarch64 JDK than the x86 one. Im speaking of factor 10x for the bunny mark.

@soywiz
Copy link
Member

soywiz commented Feb 12, 2021

Awesome. And I asume you are talking about a clean build right, were not using prebuilt classes and stuff?

Bunnymark should be both GPU and CPU-bounded I guess, and for JNA I cannot use the native approach, since some of the OpenGL symbols are dynamically loaded via another opengl call, and I don't know a way to rebind them via code, though I guess is only affected by the number of batches, and in bunnymark it is pretty small.

But weill, It is using alpha blending, and other things that could be optimized. Have you tried to not use a retina resolution and see if the FPS changes? I usually use this: https://github.com/avibrazil/RDM to switch between retina and non-retina resolutions, to check that everything works. Retina has 4 times the number of pixels, and in latests versions it also does a 4x MSAA I believe (not sure if that happens on MacOS too, but I guess we could drop it when using retina resolutions)

@emign
Copy link
Contributor Author

emign commented Feb 12, 2021

I could do some more research when it is implemented in KorGE itself. That would make testing much easier for me. IntelliJ in the aarch64 version does not like the HUGE KorGE-next Project. Maybe its Gradle too. It will have aarch64 in Gradle 7.0

EDIT: Good news is, that the KorGE next repo seems to work well with the 7.0 version of Gradle (gradle-7.0-20210211230048+0000 at least)

@soywiz soywiz merged commit 846be7d into soywiz-archive:master Feb 12, 2021
@soywiz
Copy link
Member

soywiz commented Feb 12, 2021

apple_m1

apple_m1_2

Seems to work properly on my machines, thanks Nico! :)

@emign
Copy link
Contributor Author

emign commented Feb 13, 2021

Azul apple silicon JDK: FPS drop below 50 at around 250k bunnies. under 2 seconds on third built till window
Adoptionen Rosetta/x64: FPS drop below 50 at around 90k bunnies. 8 seconds on third built till window

BTW:I can work better now with the KorGE Next project if we want to test stuff out. Forgot to increase the IDEA memory limit since it is a new installation

@soywiz
Copy link
Member

soywiz commented Feb 13, 2021

Thats around 3-4x, nice. There are some improvements pending in both gpu and cpu

@soywiz
Copy link
Member

soywiz commented Feb 14, 2021

IDEA is suffering a lot with korge-next. I have simplified it a bit by not including the intellij plugin here.

Regarding to optimizations:

It would be possible to use geometry shaders, emitting just points and generating quads from it, effectively reducing the GPU bandwidth requirements to almost 1/4, and not performing lots of floating on the CPU.
But KorGE is not going to support geometry shaders in the mid term because WebGL1 or even 2 doesnt support it, so until something like WebGPU similar to vulkan/metal is available and mainstream I wont probably add support for something more advanced.

In the meantime there is a trick to achieve a similar effect that theoretically requires twi extensions widely available:

https://developer.mozilla.org/en-US/docs/Web/API/ANGLE_instanced_arrays

https://developer.mozilla.org/en-US/docs/Web/API/OES_texture_float

But would require a value greater than one for this property: MAX_VERTEX_TEXTURE_IMAGE_UNITS

We could create a quad, and render lots of instances on it, then use a texture to read all the properties for all the stuff, but requires texture sampling on the vertex shader that is not always available.

We can also try to reorder how memory of the objects is allocated and stored to be as much contiguous as possible

@soywiz
Copy link
Member

soywiz commented Feb 15, 2021

@emign Can you try master again on your Apple M1? You should be able to place a few more bunnies :)

@emign
Copy link
Contributor Author

emign commented Feb 15, 2021

This brings some funny problems with recurring frame drops. See these videos:

https://youtu.be/5zZLRCo6vu4

https://youtu.be/TOC3Ez4Wm3k

@soywiz
Copy link
Member

soywiz commented Feb 15, 2021

Can't watch the videos, they are marked as private

@emign
Copy link
Contributor Author

emign commented Feb 15, 2021

fixed

@soywiz
Copy link
Member

soywiz commented Feb 15, 2021

Have you master updated?

@emign
Copy link
Contributor Author

emign commented Feb 15, 2021

Its a completely new git clone

@soywiz
Copy link
Member

soywiz commented Feb 15, 2021

I had that issue before and fixed it with this commit:
087021e

That's why I asked. So if you perform a git pull it doesn't bring new commits?

@emign
Copy link
Contributor Author

emign commented Feb 15, 2021

Head was ob 087021e
I pulled 08314d a6ebf72 now. Retesting

@emign
Copy link
Contributor Author

emign commented Feb 15, 2021

@soywiz
Copy link
Member

soywiz commented Feb 15, 2021

Just tried on an intel mac, and seems to work properly. If you reduce to , batchMaxQuads = 4000 the problem persists, and sprites stop displaying at 4000?

@emign
Copy link
Contributor Author

emign commented Feb 15, 2021

Better:
https://youtu.be/z70qZovsAww

@soywiz
Copy link
Member

soywiz commented Feb 15, 2021

Still that shows artifacts, it is a bit strange. Can you try to run it on JS? In this effort I tried to optimize all the targets, and each one has its own peculiarities. But would want to see if the JS target displays those artifacts on your computer with different batchMaxQuads values.

Was not able to reproduce your issue myself with master on Windows, macOS intel, JS and Android.

This might fix the issue: ee1b313

@emign
Copy link
Contributor Author

emign commented Feb 15, 2021

JS does not show artifacts. But I believe the z-index of the pouring bunnies does not look right. The stream of new bunnies plops behind the rest just to reappear.

https://youtu.be/ygGLuXYWg40

Thats on BatchBuilder2D.MAX_BATCH_QUADS)

@soywiz
Copy link
Member

soywiz commented Feb 15, 2021

Have you done a git pull? I think I have fixed the issue you had here: ee1b313

@emign
Copy link
Contributor Author

emign commented Feb 15, 2021

just saw that after my answer. building as we speak

@emign
Copy link
Contributor Author

emign commented Feb 15, 2021

yes its fixed
wohoo

120k bunnies aarch64 JDK to drop under 50 FPS. But that's on the MacBook Air .
Have to test it on the mini later/tomorrow (the results from above are from the mini)

@soywiz
Copy link
Member

soywiz commented Feb 15, 2021

That's much worse then? Or is that rosetta?

Azul apple silicon JDK: FPS drop below 50 at around 250k bunnies. under 2 seconds on third built till window
Adoptionen Rosetta/x64: FPS drop below 50 at around 90k bunnies. 8 seconds on third built till window

@emign
Copy link
Contributor Author

emign commented Feb 15, 2021

yes it is worse for the aarch64 jdk.
Good message is, that the x86/rosetta version is at 120k too now.
But the aarch64 version regressed

@soywiz
Copy link
Member

soywiz commented Feb 15, 2021

If VisualVM works on aarch64, could you profile the old version and the new one when you have time and send me the snapshot .nps files for 20 seconds of CPU sampling with 400K sprites on aarch64 old faster version and new slower version?

2021-02-15 (9)

Pressing the Sampler -> CPU, then Snapshot, then the diskette Export Snapshot data to save it

@emign
Copy link
Contributor Author

emign commented Feb 15, 2021

ofc. But I cannot do it today.
Well tomorrow is in 19 minutes but I mean this night :)

@emign
Copy link
Contributor Author

emign commented Feb 16, 2021

I tested with 7d92c54 on the mini again.

Drop under 50 FPS
Mac mini x86/rosetta JDK: around 200k Sprites
Mac mini aarch64 JDK: around 330k Sprites
MacBook Air x86/rosetta JDK: around 200k Sprites
MacBook Air aarch64 JDK: around 330k Sprites
(I tried it with power attached and on battery)

So the results are consistent again, which makes sense, because they have the exact same CPU/GPU. Did you do any optimizations in between?

@soywiz
Copy link
Member

soywiz commented Apr 29, 2021

Hey @emign ! How are you doing?

Could you by chance try: ./gradlew :samples:bunnymark-fast:runJvm on the Apple M1? And try to resize the window to make it smaller. The -fast sample should now be GPU-bounded and I wonder if it can reach 60fps with 800K bunnies

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unsatisfied Link Error with JVM on Apple Silicon
2 participants