-
Notifications
You must be signed in to change notification settings - Fork 36
Conversation
Only tested on my machine with Azul aarch64 1.8 JDK |
Thanks! And with this, everything works on Apple Silicon? If that's the case, that's amazing! Tough could make sense since there is only JNA and not JNI at all. Despite I hope they have tried & they have CI, let me try it in the coming days that everything still works on the typical targets. |
Well. It compiles a working JVM target for the aarch64 JDK/JVM from azul. What else there is to be discovered: I don't know :) I wanted to introduce a github action for apple silicon for korlibs but there isn't any at the moment. |
For the record, this is the merged PR: java-native-access/jna#1297 BTW Nico, Im just curious, have you tried the sprites10k sample, or the bunnymark one from this repo and have some numbers of CPU usage/fps between using an emulated x64 java runtime vs using the native arm64 version. As far as I know the M1 is lightning fast, and having the GPU and embedded memory on the SoC + 5nm lithography this should be a beast |
Sorry I don't have anything objectively or scientifically accurate. The debug window (F7) cannot be opened on macOS 11.2 because it freezes the JVM app. JVM: 100,000 bunnies: around 42-44% CPU The compile times are SIGNIFICANTLY shorter on a native aarch64 JDK than the x86 one. Im speaking of factor 10x for the bunny mark. |
Awesome. And I asume you are talking about a clean build right, were not using prebuilt classes and stuff? Bunnymark should be both GPU and CPU-bounded I guess, and for JNA I cannot use the native approach, since some of the OpenGL symbols are dynamically loaded via another opengl call, and I don't know a way to rebind them via code, though I guess is only affected by the number of batches, and in bunnymark it is pretty small. But weill, It is using alpha blending, and other things that could be optimized. Have you tried to not use a retina resolution and see if the FPS changes? I usually use this: https://github.com/avibrazil/RDM to switch between retina and non-retina resolutions, to check that everything works. Retina has 4 times the number of pixels, and in latests versions it also does a 4x MSAA I believe (not sure if that happens on MacOS too, but I guess we could drop it when using retina resolutions) |
I could do some more research when it is implemented in KorGE itself. That would make testing much easier for me. IntelliJ in the aarch64 version does not like the HUGE KorGE-next Project. Maybe its Gradle too. It will have aarch64 in Gradle 7.0 EDIT: Good news is, that the KorGE next repo seems to work well with the 7.0 version of Gradle (gradle-7.0-20210211230048+0000 at least) |
Azul apple silicon JDK: FPS drop below 50 at around 250k bunnies. under 2 seconds on third built till window BTW:I can work better now with the KorGE Next project if we want to test stuff out. Forgot to increase the IDEA memory limit since it is a new installation |
Thats around 3-4x, nice. There are some improvements pending in both gpu and cpu |
IDEA is suffering a lot with korge-next. I have simplified it a bit by not including the intellij plugin here. Regarding to optimizations: It would be possible to use geometry shaders, emitting just points and generating quads from it, effectively reducing the GPU bandwidth requirements to almost 1/4, and not performing lots of floating on the CPU. In the meantime there is a trick to achieve a similar effect that theoretically requires twi extensions widely available: https://developer.mozilla.org/en-US/docs/Web/API/ANGLE_instanced_arrays https://developer.mozilla.org/en-US/docs/Web/API/OES_texture_float But would require a value greater than one for this property: MAX_VERTEX_TEXTURE_IMAGE_UNITS We could create a quad, and render lots of instances on it, then use a texture to read all the properties for all the stuff, but requires texture sampling on the vertex shader that is not always available. We can also try to reorder how memory of the objects is allocated and stored to be as much contiguous as possible |
@emign Can you try master again on your Apple M1? You should be able to place a few more bunnies :) |
This brings some funny problems with recurring frame drops. See these videos: |
Can't watch the videos, they are marked as private |
fixed |
Have you master updated? |
Its a completely new git clone |
I had that issue before and fixed it with this commit: That's why I asked. So if you perform a |
Just tried on an intel mac, and seems to work properly. If you reduce to |
Better: |
Still that shows artifacts, it is a bit strange. Can you try to run it on JS? In this effort I tried to optimize all the targets, and each one has its own peculiarities. But would want to see if the JS target displays those artifacts on your computer with different
This might fix the issue: ee1b313 |
JS does not show artifacts. But I believe the z-index of the pouring bunnies does not look right. The stream of new bunnies plops behind the rest just to reappear. Thats on |
Have you done a |
just saw that after my answer. building as we speak |
yes its fixed 120k bunnies aarch64 JDK to drop under 50 FPS. But that's on the MacBook Air . |
That's much worse then? Or is that rosetta?
|
yes it is worse for the aarch64 jdk. |
If VisualVM works on aarch64, could you profile the old version and the new one when you have time and send me the snapshot
|
ofc. But I cannot do it today. |
I tested with 7d92c54 on the mini again. Drop under 50 FPS So the results are consistent again, which makes sense, because they have the exact same CPU/GPU. Did you do any optimizations in between? |
Hey @emign ! How are you doing? Could you by chance try: |
Bump JNA Version to 5.7.0 for Apple Silicon Support
Fixes: korlibs/korge#335