Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on ppc64 big endian #625

Closed
Doctorj128 opened this issue Oct 24, 2024 · 34 comments
Closed

Crash on ppc64 big endian #625

Doctorj128 opened this issue Oct 24, 2024 · 34 comments

Comments

@Doctorj128
Copy link

I'm on a Powermac G5 quad 2.5GHz, Radeon HD5770, 16GB RAM with Gentoo Linux on kernel 6.6.52
I got the 'base' folder from my Mac copy of the game, which was installed on my second hard drive.

The game reaches this screen before crashing:
Screenshot_2024-10-24_17-18-16

Here's the log:
log.txt

@DanielGibson
Copy link
Member

I don't have such hardware so I'm afraid you'll have to debug it yourself :-/
As someone using Gentoo on obscure hardware you'll know how to use gdb?

@Doctorj128
Copy link
Author

I think there must be some endianness issues in imgui_draw.cpp

in_grabKeyboard: Will *not* grab the keyboard if mouse is grabbed, so global keyboard-shortcuts (like Alt-Tab) will still work
dhewm3: /home/doctorj1/programs/dhewm3/neo/libs/imgui/imgui_draw.cpp:2598: ImFont* ImFontAtlas::AddFontFromMemoryTTF(void*, int, float, const ImFontConfig*, const ImWchar*): Assertion `font_data_size > 100 && "Incorrect value for font_data_size!"' failed.

Thread 1 "dhewm3" received signal SIGABRT, Aborted.
0x00007ffff74a0fdc in ?? () from /usr/lib64/libc.so.6
(gdb) bt
#0  0x00007ffff74a0fdc in ?? () from /usr/lib64/libc.so.6
#1  0x00007ffff7442564 in raise () from /usr/lib64/libc.so.6
#2  0x00007ffff742623c in abort () from /usr/lib64/libc.so.6
#3  0x00007ffff7438298 in ?? () from /usr/lib64/libc.so.6
#4  0x00007ffff743833c in .__assert_fail () from /usr/lib64/libc.so.6
#5  0x00000001003ab08c in ImFontAtlas::AddFontFromMemoryTTF (this=this@entry=0x104a00cf0, 
    font_data=font_data@entry=0x7ffed7e70010, font_data_size=font_data_size@entry=-266861056, 
    size_pixels=size_pixels@entry=18, font_cfg_template=<optimized out>, glyph_ranges=glyph_ranges@entry=0x0)
    at /home/doctorj1/programs/dhewm3/neo/libs/imgui/imgui_draw.cpp:2598
#6  0x00000001003ab5c0 in ImFontAtlas::AddFontFromMemoryCompressedTTF (this=0x104a00cf0, 
    compressed_ttf_data=compressed_ttf_data@entry=0x10056ccf0 <D3::ImGuiHooks::ProggyVector_compressed_data>, 
    compressed_ttf_size=compressed_ttf_size@entry=198655, size_pixels=18, font_cfg_template=font_cfg_template@entry=0x0, 
    glyph_ranges=glyph_ranges@entry=0x0) at /home/doctorj1/programs/dhewm3/neo/libs/imgui/imgui_draw.cpp:2616
#7  0x0000000100435df8 in D3::ImGuiHooks::NewFrame () at /home/doctorj1/programs/dhewm3/neo/sys/sys_imgui.cpp:337
#8  0x000000010016ccd4 in idCommonLocal::Frame (this=0x1009f1318 <commonLocal>)
    at /home/doctorj1/programs/dhewm3/neo/framework/Common.cpp:2445
#9  0x00000001000855e0 in main (argc=<optimized out>, argv=0x7ffffffff2d8)
    at /home/doctorj1/programs/dhewm3/neo/sys/linux/main.cpp:452

@DanielGibson
Copy link
Member

Interesting! Could be that Dear ImGui's compressed font data only works with little endian, I'll try to look into that.

Does dhewm3 work if you disable Dear ImGui (pass -DIMGUI=OFF to cmake)?

@Doctorj128
Copy link
Author

Aha! That worked!
It compiles and runs great now, but it looks like event triggers don't work at all. All the NPCs stay in their default A-pose and won't say anything or move. Good progress though!

Screenshot_2024-10-24_23-46-44
Screenshot_2024-10-24_23-45-34
Screenshot_2024-10-24_23-44-47

DanielGibson added a commit to DanielGibson/dhewm3 that referenced this issue Oct 24, 2024
instead of whatever else compression was used there.
Hopefully fixes big endian problems (dhewm#625)
@DanielGibson
Copy link
Member

Great to hear it works better with Dear ImGui.
There have been reports of that t-pose problem with big endian before. Someone with such hardware (i.e. not me) needs to debug this..

Can you try if ImGui works with this branch: https://github.com/DanielGibson/dhewm3/tree/imgui-base85
There the font is compressed in a different format that should work with any endianess

@DanielGibson
Copy link
Member

see also #472 for the T-Pose issue

@Doctorj128
Copy link
Author

Yeah, looks like that branch works.
I'd love to help debug the NPC pose issue, but I don't really know where to start. Let me know if you want me to test anything specific!

@Link4Electronics
Copy link

Link4Electronics commented Oct 25, 2024

I was about to report this issue. I compiled the branch imgui-base85 and it works fine, despite the T-pose problem. I noticed that during the cutscenes the npcs that have movements they don't have the t-pose, but other npcs on the background that don't make any movement, they stay on their t-pose. Other thing I noticed, it's not possible to open the PDA, it tries to open but closes immediately, it stays bugged trying to open for the rest of the game, but only when looking to the left and up, if look to the right and down, it doesn't try to open the PDA.
20241024_221534

@DanielGibson
Copy link
Member

I wish I had an idea where to start debugging this :-/
One thing worth trying might be starting dhewm3 with +set com_forceGenericSIMD 1 arguments, so it doesn't try to use AltiVec. But apart from that I don't really know where to start either

@DanielGibson
Copy link
Member

If com_forceGenericSIMD 1 doesn't help, you could try running the game in valgrind to see if it accesses invalid or uninitialized memory.
Running it in valgrind can be really slow, so it makes sense to prepare a minimal testcase.
So first, without valgrind, try running
./dhewm3 +map testmaps/test_box +spawn marscity_security_goggles_pda
Is that security guy in that A-pose (or T-pose or whatever it is)?
If not, try spawning something else: Open the console (Shift+Esc) and type spawn marscity_ and press the Tab key to see possible autocompletions. Maybe try marscity_soldier_bald_pda or marscity_civilian1

Once you found someone to spawn who shows the broken behavior, quit dhewm3 and run
valgrind ./dhewm3 +map testmaps/test_box +spawn marscity_civilian1
(or whatever model worked for you)
Now you may have to wait for several minutes for dhewm3 to start, load that level etc.
Eventually you the level should have loaded and the spawned model should be visible. Maybe quickly look around if that's possible and quit dhewm3.
Now check if valgrind has written anything interesting to the terminal (the lines will start with ==12345==, if 12345 was the PID).
Post those things here (scroll all the way up to where you entered the command to make sure you didn't miss anything)

@Link4Electronics
Copy link

Link4Electronics commented Oct 27, 2024

Oh hey, sorry for taking too long to answer. I tried that command ./dhewm3 +map testmaps/test_box +spawn marscity_security_goggles_pda
The security guard already spawns on the A-pose
20241027_135652
Tried spawning other character models, all with the A-pose. Tried with valgrind and only returned 6309, no 12345 PID.

And don't need to worry about too much if this doesn't work, I understand that powerpc big endian isn't a common platform, it's already a miracle that it compiles and runs 🤣, but thanks for the care anyway, we're just sharing and reporting the issue that maybe or not could narrow the problem, not really demanding or expecting any fix.

A side note, the colors in the game are rendering correctly, reason I mention that is because I compiled many opensource projects, some do render and works fine, some has the color channels swapped, due to big endian uses BGRA space color instead of traditional little endian RGBA, for eg. dhewm3 is rendering correctly the colors, quakespawn doesn't (despite it could be a Mesa3D driver problem). My suspicious is A-pose seems to be related to the physics logic/engine or actor animation? Maybe need a byteswap on an array or vector somewhere.

Best regards,
Link.

@DanielGibson
Copy link
Member

DanielGibson commented Oct 27, 2024

It doesn't matter if the number is 6309, 12345 was just a placeholder.
The question is whether valgrind printed any warnings or errors about uninitialized reads or invalid writes or such.

The weird thing is, dhewm3 works at least with 32bit Big Endian, like MacOS 10.5 on PowerPC, and according to #472 (comment) also on 32bit Big Endian PPC with Linux

DanielGibson added a commit to DanielGibson/dhewm3 that referenced this issue Oct 29, 2024
instead of whatever else compression was used there.
Hopefully fixes big endian problems (dhewm#625)
@DanielGibson
Copy link
Member

can you try a debug build of this branch: https://github.com/DanielGibson/dhewm3/tree/PPC64BE-debug ?
if you're super-lucky it works (then we only need to figure out which of my hacks and fixes fixed it), if you're a bit less lucky you'll get assertions that might help debugging the problem, if you're unlucky it'll be the same shit as before

if you get an assertion, please reproduce it in gdb and get a backtrace.

@Doctorj128
Copy link
Author

Hi, I've just tried the new branch. Tragically I don't think anything has changed :(
I'd love to try and use valgrind to help, but I think that would mean recompiling glibc, which will probably take ages.

@DanielGibson
Copy link
Member

DanielGibson commented Oct 30, 2024

Tragically I don't think anything has changed :(

That's a pity. Just to be sure, is it the same for you, @Link4Electronics ?

Another thing to try to hopefully narrow down the problem:
run ./dhewm3 +map testmaps/test_box
Once the map is loaded, open the console (Shift+Esc) and enter testModel marscity_civilian1.
This should spawn that scientist that sits in the hangar right at the beginnig of the game, but in T-pose.
Now (still in the console) enter testAnim stand.
Now the scientist should be sitting, like this:
image
What does it do on PPC64?

How does it look like if you then enter r_showSkel 1 in the console?

@Doctorj128
Copy link
Author

The result is exactly the same. He does sit down correctly, and this message is printed:
anim 'stand', 4.959 seconds. 120 frames

Results of r_showSkel 1:
Screenshot_2024-10-30_10-17-43

Some animations do actually work, such as the characters blinking. It also looks like RaiseWeapon() is being called constantly, multiple times every frame.

@Link4Electronics
Copy link

Link4Electronics commented Oct 30, 2024

With the branch PPC64BE-debug, got the same behavior, tried to run with valgrind, but when it was about to load testmaps/test_box it crashed. Interesting though that PPC32 doesn't have this behavior, another side note from me, some people had a similar behavior with the project sm64ex, it will compile on PPC64 but after the loading screen it hangs, PPC32 is fine and than by just doing that single change from OP post, s32 word to s64 word, sm64ex works fine on PPC64.

@DanielGibson
Copy link
Member

DanielGibson commented Oct 30, 2024

but when it was about to load testmaps/test_box it crashed.

It only crashes when running in valgrind? Or does that branch always crash when loading the map?
Does anything get printed when that happens?

by just doing that single change from OP post, s32 word to s64 word, sm64ex works fine on PPC64

I was also looking for similar unions in dhewm3, but the ones I found apparently don't cause the issue, so it must be something else

@DanielGibson
Copy link
Member

Before I forget it, something different: We got the ImGui code to stop crashing, but does it actually work?
If you open the advanced settings menu by pressing F10 (or, if that doesn't work, entering dhewm3Settings in the console), does it look like expected, i.e. like this:
image
?

@DanielGibson
Copy link
Member

DanielGibson commented Oct 30, 2024

Oh and yet another thing: One screenshot above shows the console like

WARNING: script/map_marscity1.script(7): Thread 'map_marscity1::main': Entity not found for event 'trigger'. Terminating thread.

Do you also get any warnings when running ./dhewm3 +map testmaps/test_box +spawn marscity_civilian1? (The only warning I get is "WARNING: idAI_marscity_civilian1_40 has no AAS file" which is expected for the test level)

Update: For testing this, ideally use the latest state of https://github.com/DanielGibson/dhewm3/tree/PPC64BE-debug - I just added a commit with additional debug prints and assertions.
If the warnings can't be reproduced with the test level, start a new game (do not load a savegame, in case whatever state is broken gets saved!) and post the warnings that get printed when doing that.

@DanielGibson
Copy link
Member

DanielGibson commented Oct 30, 2024

I might have a fix, please test the latest state of the aforementioned PPC64BE-debug branch.

Thinking about it again, it's most probably not fixed completely yet, though I think I at least know the cause now.

Please still check Dear ImGui and do the tests for the "Entity not found" warnings

@Link4Electronics
Copy link

Link4Electronics commented Oct 30, 2024

Sorry it took me a while to answer, here's the valgrind log with ./dhewm3 +map testmaps/test_box +spawn marscity_civilian1. this time it didn't crashed.
log.txt from the branch PPC64BE-debug

here's a photo how ImGui is rendering on big endian (it's not a problem from dhewm3 project, unless there's something related), Shipwright project does the same (pressing F1 to open Dear ImGui menu, it's all wrong), 3D Space Cadet somehow it renders ok ImGui on big endian.
20241030_162139

spacecadet (probably using a very old version of ImGui)
image
I should report this issue to DearImGui project.

@DanielGibson
Copy link
Member

Don't worry, I think I know how to fix the Dear ImGui issue, I just wanted to make sure it actually happens before doing the change.

DanielGibson added a commit to DanielGibson/dhewm3 that referenced this issue Oct 30, 2024
@DanielGibson
Copy link
Member

I just pushed a fix for the ImGui color issue, I hope it works..

If you want to tell other projects how to fix it, they just need to add the following to imconfig.h:

// NOTE: D3_IS_BIG_ENDIAN is dhewm3-specific, I set it from CMake
// (it gets passed to the compiler as `-DD3_IS_BIG_ENDIAN=1` or =0 for little endian)
// so you'll need to adjust that line for your project
#if D3_IS_BIG_ENDIAN
  #define IM_COL32_R_SHIFT    24
  #define IM_COL32_G_SHIFT    16
  #define IM_COL32_B_SHIFT    8
  #define IM_COL32_A_SHIFT    0
  #define IM_COL32_A_MASK     0x000000FF
#endif

I'll try to fix the remaining problems with the script code now.

@Link4Electronics
Copy link

Yeap, progress! Kudos!
20241030_204225

@DanielGibson
Copy link
Member

I just pushed another commit that hopefully fixes the T-pose problem as well :)

@Link4Electronics
Copy link

Link4Electronics commented Oct 31, 2024

Impressive! Congratulations and happy halloween! xD
20241030_210923
Pda, the entrance scanner scene works, now even NPCs talk!

@DanielGibson
Copy link
Member

Now I only need to clean up all that shit and patch the resurrection of evil code as well.. and eventually the mods :-/

DanielGibson added a commit to DanielGibson/dhewm3 that referenced this issue Oct 31, 2024
instead of whatever else compression was used there.
Fixes crash on Big Endian systems (dhewm#625)
DanielGibson added a commit to DanielGibson/dhewm3 that referenced this issue Oct 31, 2024
DanielGibson added a commit to DanielGibson/dhewm3 that referenced this issue Oct 31, 2024
idInterpreter::Push() is used only for int and (reinterpreted) float
values, not pointers (as far as I can tell), so 32bit values on all
relevant platforms.
It stored its value as intptr_t at `&localstack[ localstackUsed ]` - on
64bit platforms intptr_t is 64bit.
Unfortunately, all code reading from the stack just get got a pointer
to `&localstack[ localstackUsed ]` in the type they want to read
(like `int*` or `float*`) and read that. On Little Endian that happens
to work, on 64bit Big Endian it reads the wrong 4 bytes of the intptr_t,
so it doesn't work.

fixes dhewm#625, dhewm#472
@DanielGibson
Copy link
Member

The cleaned up code is in this branch: https://github.com/DanielGibson/dhewm3/tree/fix-ppc64be

@Link4Electronics @Doctorj128 could you please test that branch to make sure I got all the important changes, but also please play a bit more in case there are more Big Endian issues that haven't been found yet.

DanielGibson added a commit to DanielGibson/dhewm3 that referenced this issue Oct 31, 2024
instead of whatever other compression was used there.
Fixes crash on Big Endian systems (dhewm#625)
DanielGibson added a commit to DanielGibson/dhewm3 that referenced this issue Oct 31, 2024
@DanielGibson
Copy link
Member

See also #626

@Link4Electronics
Copy link

Link4Electronics commented Oct 31, 2024

Compiled fix-ppc64be
Went further in the game, fps dropped due to the explosion on scene, avg is ~60 fps playable with a R5 230.
20241031_101117

Just for fun, compiling with -maltivec rn...
I wonder how could test idVec3

@DanielGibson
Copy link
Member

So does this shit work now or what? @Doctorj128

@Doctorj128
Copy link
Author

Yes! Sorry, I've been fairly busy recently but I did test the changes and didn't find any further issues. Thanks!

@DanielGibson
Copy link
Member

Great, thanks for testing, I merged this.
If any new bugs turn up, please create a new bugreport

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants