Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge performance drop on r300 with default Thunder and Vega scene #1172

Open
illwieckz opened this issue Jun 1, 2024 · 56 comments
Open

Huge performance drop on r300 with default Thunder and Vega scene #1172

illwieckz opened this issue Jun 1, 2024 · 56 comments

Comments

@illwieckz
Copy link
Member

illwieckz commented Jun 1, 2024

I'm trying to investigate a huge performance drop affecting some low-end cards.

I currently suspect:

  • a bug in the driver itself.
  • something stupid done on our side similar to the way our skybox code was iterating over GPU memory directly:
    Sky performance cost #849 (comment)
  • a combination of the twos.

Here is the Thunder map, lowest preset, 640×480 resolution, default spectator scene (1024 -3456 448 -90 0), 3fps:

unvanquished_2024-06-01_165641_000

Here is the Vega map, lowest preset, 640×480 resolution, default spectator scene (1472 -1792 176 -180 0), 5fps:

unvanquished_2024-06-01_165744_000

This configuration is expected to achieve 20~30fps 60fps.

The testbench is a very low-end config, one of my favorites for identifying bottlenecks:

  • CPU: AMD Athlon 64 X2 4400+, dual core 2.3GHz.
  • RAM: 8Go DDR2 800MHz , 4×2Go Hynix HYMP125U64CP8-S6, dual channel.
  • GPU: ATi Radeon X550 HyperMemory, RV370 (r300, tiny ALU), 128Mo 64-bit DDR, PCIe x16 1.0.
  • OS: Ubuntu 23.10 mantic, Linux 6.5.0.
  • Driver: Mesa 23.2.1 (stock) or Mesa 24.0.7 (self built).

In the Thunder map, I suspect the bottleneck comes from rendering something in that room and/or the outside. It can be the electric arc effect on those pillars, and/or the lightning effect on the top of the outside of the building, as this room as huge windows on the outside. Leaving that room and losing sight with the outside brings back performance, leaving that room without losing sight with the outside keeps the performance low.

In the Vega map, I suspect the bottleneck comes from rendering that globe or something around that globe. From the 4 sides of the room, I recover performance by turning back to that globe.

I identified places were just turning around the view enables/disables the performance loss in both maps.

Thunder, -30 -1497 10 -20 0:

unvanquished_2024-06-01_165427_000

Thunder, -30 -1497 10 112 0:

unvanquished_2024-06-01_165451_000

Vega, 667 -1792 176 0 0:

unvanquished_2024-06-01_165815_000

Vega: 667 -1792 176 -180 0:

unvanquished_2024-06-01_165833_000

As you see, just turning around the head without moving from the place you are enables/disables the cause of the bottleneck.

@illwieckz
Copy link
Member Author

illwieckz commented Jun 1, 2024

The cause of the bottleneck is very hard to track down.

Orbit fails to show me the calltrace, only identifying a very costly ioctl without any identified caller. Once, Orbit managed to relate the ioctl to some render thread in the Bottom-Up view.

20240601-043231-000 orbit-unvanquished

20240601-012452-000 orbit-unvanquished

And gprofng also shows me something suspect with openal, but what if the i/o on the sound card is actually slowed down by the whole PCIe bus being on knees because of agressive io for rendering?

20240601-173623-000 gprofng-unvanquished

But gperftools seems to track it back to the r300 DRM driver through Mesa, more explicitely:

20240601-173732-000 gperftools-unvanquished

That's why I suspect something similar to what we had with the sky: that GPU has a slow PCIe 1.0 x16 bus, and if we process the memory GPU directly in our computation, this can be very bad.

@VReaperV
Copy link
Contributor

VReaperV commented Jun 1, 2024

As you see, just turning around the head without moving from the place you are enables/disables the cause of the bottleneck.

What if you move backwards from there? If perf drops once the globe is in view, then it is the issue, otherwise it's not.

@illwieckz
Copy link
Member Author

Both the electric arc effect and the globe materials use some tcMod:

textures/thunder_lightning/horizontallightning
{
	qer_editorImage textures/thunder_lightning_src/horizontallightning
	surfaceparm trans
	surfaceparm nolightmap
	surfaceparm nomarks
	surfaceparm nonsolid
	{
		map textures/thunder_lightning_src/horizontallightning
		blendfunc add
		tcmod scroll -8 -0
	}
}
models/mapobjects/vega/globe/globe
{
	surfaceparm nomarks
	surfaceparm nonsolid
	cull none

	{
		map models/mapobjects/vega/globe/globe
		tcmod scroll .05 0
	}
}

In the slow sky thread, SomaZ recommended to also look at tcMod:

Should probably also make sure any tcMods or else also don't read/write to tess.
-- #849 (comment)

@VReaperV
Copy link
Contributor

VReaperV commented Jun 1, 2024

tcMod uses a matrix uniform.

@illwieckz
Copy link
Member Author

Commenting out the tcmod scroll line in the vega globe shader doesn't bring back the performance.

I also verified that doing stage->bundle[ 0 ].numTexMods = 0 in FinishStages also disables the scrolling but doesn't bring back the performance.

@illwieckz
Copy link
Member Author

illwieckz commented Jun 1, 2024

viewpos relation fps
511 -1792 176 -180 0 globe in the back 40
615 -1792 176 -180 0 globe in the back 30
773 -1792 176 -180 0 globe in the back 17
844 -1792 176 -180 0 globe in the back 14
893 -1792 176 -180 0 inside globe 11
934 -1792 176 -180 0 inside globe 10
997 -1792 176 -180 0 inside globe 10
1006 -1792 176 -180 0 globe in front 10
1074 -1792 176 -180 0 globe in front 10
1143 -1792 176 -180 0 globe in front 9
1288 -1792 176 -180 0 globe in front 6
1382 -1792 176 -180 0 globe in front 6
1621 -1792 176 -180 0 globe in front 4

unvanquished_2024-06-01_181011_000

unvanquished_2024-06-01_181033_000

unvanquished_2024-06-01_181054_000

unvanquished_2024-06-01_181112_000

unvanquished_2024-06-01_181128_000

unvanquished_2024-06-01_181145_000

unvanquished_2024-06-01_181203_000

unvanquished_2024-06-01_181219_000

unvanquished_2024-06-01_181239_000

unvanquished_2024-06-01_181255_000

unvanquished_2024-06-01_181311_000

unvanquished_2024-06-01_181342_000

unvanquished_2024-06-01_181403_000

@illwieckz
Copy link
Member Author

One may also suspect the amount of tris, but vega was running properly in Tremulous.

map viewpos performance
thunder 1472 -1792 176 -180 0 slow
thunder 667 -1792 176 0 0 slow
thunder 667 -1792 176 -180 0 fast

unvanquished_2024-06-01_182516_000

unvanquished_2024-06-01_182541_000

unvanquished_2024-06-01_182618_000

map viewpos performance
vega 1024 -3456 448 -90 0 slow
vega -30 -1497 10 -20 0 slow
vega -30 -1497 10 112 0 fast

unvanquished_2024-06-01_182731_000

unvanquished_2024-06-01_182752_000

unvanquished_2024-06-01_182823_000

@illwieckz
Copy link
Member Author

But I can render the whole plat23 map (0 3568 3194 -90 66) with all tris visible, and keep a smooth 30fps:

unvanquished_2024-06-01_184012_000

unvanquished_2024-06-01_184024_000

@VReaperV
Copy link
Contributor

VReaperV commented Jun 1, 2024

But I can render the whole plat23 map (0 3568 3194 -90 66) with all tris visible, and keep a smooth 30fps:

For comparing things like that it'd be helpful to know how many triangles are in each case in particular (some value of r_speeds shows that I believe).

@illwieckz
Copy link
Member Author

illwieckz commented Jun 1, 2024

Same computer, but with onboard nvidia GPU.

  • GPU: Nvidia Geforce 6150 LE, NV4C/MCP61 (Curie, ), 128Mo shared DDR2, HyperTransport.

Where the comparison is not fair is that instead of dedicated memory the Nvidia chip uses shared DDR2 memory from the host and instead of PCIe 1.0 x16 bus the Nvidia uses HyperTransport (some onboard bus shared with CPU). This may help with ioctl when something on the GPU is read from/written to by the CPU.

The Nvidia has a larger ALU (but that should not make a difference as long as shader runs) and may accept larger images (or properly downscale them). The ATi one uses ARB_framebuffer_object but the Nvidia only provides the older EXT_framebuffer_object which is a subset (but that should make no difference because we only use the EXT_framebuffer_object part of ARB_framebuffer_object).

Like the ATi one is from the first generation of ATi cards to support OpenGL 2, that Nvidia card is from the first generation of Nvidia cards to support OpenGL 2, and is from the year after then ATi one. Both are low end, in fact the “LE“ string in the product name means “low end”.

That card can sustain solid 20fps on both scenes: (Thunder 1024 -3456 448 -90 0, Vega 1472 -1792 176 -180 0):

unvanquished_2024-06-01_193118_000

unvanquished_2024-06-01_192956_000

@illwieckz
Copy link
Member Author

illwieckz commented Jun 1, 2024

For comparison on the same computer and that old ATi card, the vega-beta1a map on Tremulous (some GrangerHub Tremulous 1.3 build) on OpenGL 1 renderer gets 130fps (almost 140fps) with lightmaps and 190fps with vertex lighting on same scene:

20240601-203505-000 tremulous

20240601-202816-000 tremulous

With the OpenGL 2 renderer it gets 40fps (almost 50fps):

20240601-203828-000 tremulous

Except for the vertex lighting, this is using the default Tremulous configuration (no “lowest”-like preset).

That other engine is also ridled with bugs, on such hardware the OpenGL1 renderer spawns lots of error logs every frame:

Mesa: error: GL_INVALID_VALUE in glLockArraysEXT(count)
Mesa: error: GL_INVALID_OPERATION in glUnlockArraysEXT(reexit)

and on such hardware the OpenGL2 renderer doesn't work at all with lightmaps, and some fonts are totally broken, and some texture blending is broken:

20240601-202057-000 tremulous

20240601-203254-000 tremulous

But the experiments says something: not only it may be possible to render such map on such hardware without any performance drop in that scene, but we may be able to achieve double the performance on such hardware. Edit: In fact we already achieve 60fps in other maps that don't trigger the slow down.

This test was done with the ATi card described in first message, on the computer described in first message.

@illwieckz
Copy link
Member Author

That's not the same map build that is used in Tremulous, but they should not be that different. And the amount of tris should be similar:

20240601-205355-000 tremulous

@illwieckz
Copy link
Member Author

illwieckz commented Jun 1, 2024

So, it looks like I uncovered not only one performance problem but two.

First performance problem is that serious performance drop in some places.

Second performance problem is that we may be able to get twice the same performance on some lower-end hardware but don't get it.

I expect such hardware to perform better with the OpenGL1 renderer, as at the time most of the game were doing things the GL1 way and only using shaders for some effects, so the hardware is probably architectured for that in mind. On the contrary we use shaders for everything.

But since the OpenGL 2 renderer of ioquake3 seems to still be twice faster than us when doing things the shader way in that range of hardware, I believe we should also get twice the performance we already have (reaching 40fps on such map).

And I'm very annoyed by how much it is hard to track down this. I rebuilt entirely Orbit and its dependencies to run on that old CPU, I have a own build of Mesa and libdrm (even libpciaccess) built with debug symbols, I installed all the debug packages I could find for the installed packages of my distro, including the one of the libc, and Orbit still fails to track down the slow code backtrace.

@illwieckz
Copy link
Member Author

illwieckz commented Jun 1, 2024

It looks like I got a calltrace with GDB on drmIoctl:

(gdb) break drmIoctl
Breakpoint 1 at 0x716d6f6ad070: file ../xf86drm.c, line 700.
(gdb) continue
Continuing.
[Switching to Thread 0x716d6e01ba80 (LWP 492851)]

Thread 1 "daemon" hit Breakpoint 1, drmIoctl (fd=15, request=request@entry=3223348318, arg=arg@entry=0x7ffec10c4a50) at ../xf86drm.c:700
700	{
(gdb) backtrace
#0  drmIoctl (fd=15, request=request@entry=3223348318, arg=arg@entry=0x7ffec10c4a50) at ../xf86drm.c:700
#1  0x0000716d6f6b0260 in drmCommandWriteRead (fd=<optimized out>, drmCommandIndex=drmCommandIndex@entry=30, data=data@entry=0x7ffec10c4a50, size=size@entry=32) at ../xf86drm.c:3186
#2  0x0000716d4a82539c in radeon_bo_do_map (bo=0x5688c5e046a0) at ../src/gallium/winsys/radeon/drm/radeon_drm_bo.c:445
#3  0x0000716d4a7e837e in r300_buffer_transfer_map (context=0x5688c260d040, resource=0x5688c5dd0430, level=<optimized out>, usage=802, box=0x7ffec10c4b30, ptransfer=0x5688c254bf78) at ../src/gallium/drivers/r300/r300_screen_buffer.c:146
#4  0x0000716d4a6b1730 in pipe_buffer_map_range (transfer=0x5688c254bf78, access=<optimized out>, length=4235264, offset=0, buffer=<optimized out>, pipe=<optimized out>) at ../src/gallium/auxiliary/util/u_inlines.h:398
#5  u_upload_alloc_buffer (min_size=4232544, upload=0x5688c254bf50) at ../src/gallium/auxiliary/util/u_upload_mgr.c:240
#6  u_upload_alloc (upload=0x5688c254bf50, min_out_offset=<optimized out>, size=48, alignment=alignment@entry=4, out_offset=out_offset@entry=0x7ffec10c4c84, outbuf=outbuf@entry=0x7ffec10c4c88, ptr=0x7ffec10c4c90) at ../src/gallium/auxiliary/util/u_upload_mgr.c:273
#7  0x0000716d4a6b2ca3 in u_vbuf_translate_buffers (unroll_indices=false, min_index=<optimized out>, num_vertices=3, start_vertex=264531, out_vb=1, vb_mask=<optimized out>, draw=<optimized out>, info=<optimized out>, key=<optimized out>, mgr=0x5688c25e1130) at ../src/gallium/auxiliary/util/u_vbuf.c:587
#8  u_vbuf_translate_begin (mgr=mgr@entry=0x5688c25e1130, info=info@entry=0x7ffec10c5b40, draw=draw@entry=0x7ffec10c5b24, start_vertex=start_vertex@entry=264531, num_vertices=num_vertices@entry=3, min_index=<optimized out>, unroll_indices=<optimized out>, misaligned=<optimized out>) at ../src/gallium/auxiliary/util/u_vbuf.c:805
#9  0x0000716d4a6b5120 in u_vbuf_draw_vbo (pipe=0x5688c260d040, info=0x7ffec10c5cd0, drawid_offset=93, indirect=0x0, draws=<optimized out>, num_draws=243) at ../src/gallium/auxiliary/util/u_vbuf.c:1726
#10 0x0000716d4a2d9cb3 in _mesa_validated_multidrawelements (ctx=ctx@entry=0x5688c24fcef0, index_bo=<optimized out>, mode=mode@entry=4, count=count@entry=0x5688c1947120 <tess+8432>, type=type@entry=5125, indices=indices@entry=0x5688c19451e0 <tess+432>, primcount=<optimized out>, basevertex=0x0) at ../src/mesa/main/draw.c:2118
#11 0x0000716d4a2dbbc9 in _mesa_MultiDrawElements (mode=4, count=0x5688c1947120 <tess+8432>, type=5125, indices=0x5688c19451e0 <tess+432>, primcount=243) at ../src/mesa/main/draw.c:2171
#12 0x00005688c0c6b099 in Tess_DrawElements () at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:328
#13 0x00005688c0c6c320 in Render_generic3D (pStage=0x716d6c7148c0) at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:791
#14 0x00005688c0c6c38b in Render_generic (pStage=0x716d6c7148c0) at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:804
#15 0x00005688c0c71634 in Tess_StageIteratorColor () at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:2601
#16 0x00005688c0c71f29 in Tess_End () at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:2837
#17 0x00005688c0bc8387 in RB_RenderDrawSurfaces (fromSort=shaderSort_t::SS_DEPTH, toSort=shaderSort_t::SS_DEPTH, drawSurfFilter=DRAWSURFACES_ALL) at Unvanquished/daemon/src/engine/renderer/tr_backend.cpp:903
#18 0x00005688c0bd4f33 in RB_RenderView (depthPass=true) at Unvanquished/daemon/src/engine/renderer/tr_backend.cpp:4802
#19 0x00005688c0bd7e0c in DrawViewCommand::ExecuteSelf (this=0x716d4fa8fa74) at Unvanquished/daemon/src/engine/renderer/tr_backend.cpp:5710
#20 0x00005688c0bd85e6 in RB_ExecuteRenderCommands (data=0x716d4fa8f04c) at Unvanquished/daemon/src/engine/renderer/tr_backend.cpp:5961
#21 0x00005688c0bf9f72 in R_IssueRenderCommands (runPerformanceCounters=true) at Unvanquished/daemon/src/engine/renderer/tr_cmds.cpp:183
#22 0x00005688c0bfb07d in RE_EndFrame (frontEndMsec=0x0, backEndMsec=0x0) at Unvanquished/daemon/src/engine/renderer/tr_cmds.cpp:889
#23 0x00005688c0b35ce0 in SCR_UpdateScreen () at Unvanquished/daemon/src/engine/client/cl_scrn.cpp:327
#24 0x00005688c0b25dc6 in CL_Frame (msec=193) at Unvanquished/daemon/src/engine/client/cl_main.cpp:2087
#25 0x00005688c0a44870 in Com_Frame () at Unvanquished/daemon/src/engine/qcommon/common.cpp:958
#26 0x00005688c0a3d259 in Application::ClientApplication::Frame (this=0x5688c0f23f40 <Application::GetApp()::app>) at Unvanquished/daemon/src/engine/client/ClientApplication.cpp:96
#27 0x00005688c0d2de4f in Application::Frame () at Unvanquished/daemon/src/engine/framework/Application.cpp:87
#28 0x00005688c0d6f4d7 in main (argc=60, argv=0x7ffec10c62e8) at Unvanquished/daemon/src/engine/framework/System.cpp:784

@illwieckz
Copy link
Member Author

The slow code seems to be (if I caught the right drmIoctl call):

glMultiDrawElements( GL_TRIANGLES, tess.multiDrawCounts, GL_INDEX_TYPE, ( const GLvoid ** ) tess.multiDrawIndexes, tess.multiDrawPrimitives );

glMultiDrawElements( GL_TRIANGLES, tess.multiDrawCounts,
  GL_INDEX_TYPE, ( const GLvoid ** ) tess.multiDrawIndexes, tess.multiDrawPrimitives );

@illwieckz
Copy link
Member Author

illwieckz commented Jun 1, 2024

I added this logger right before the glMultiDrawElements one:

Log::Warn( "=== tess.multiDrawPrimitives: %d", tess.multiDrawPrimitives );

Then on the default Vega spectator scene (1472 -1792 176 -180 0), it loops printing this:

…
Warn: === tess.multiDrawPrimitives: 243
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 2
Warn: === tess.multiDrawPrimitives: 8
Warn: === tess.multiDrawPrimitives: 3
Warn: === tess.multiDrawPrimitives: 6
Warn: === tess.multiDrawPrimitives: 2
Warn: === tess.multiDrawPrimitives: 40
Warn: === tess.multiDrawPrimitives: 10
Warn: === tess.multiDrawPrimitives: 10
Warn: === tess.multiDrawPrimitives: 21
Warn: === tess.multiDrawPrimitives: 4
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 12
Warn: === tess.multiDrawPrimitives: 7
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 3
Warn: === tess.multiDrawPrimitives: 4
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 14
Warn: === tess.multiDrawPrimitives: 3
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 2
Warn: === tess.multiDrawPrimitives: 4
Warn: === tess.multiDrawPrimitives: 7
Warn: === tess.multiDrawPrimitives: 2
Warn: === tess.multiDrawPrimitives: 2
Warn: === tess.multiDrawPrimitives: 2
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 3
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 7
Warn: === tess.multiDrawPrimitives: 7
Warn: === tess.multiDrawPrimitives: 7
Warn: === tess.multiDrawPrimitives: 1
[again]

@illwieckz
Copy link
Member Author

On the 667 -1792 176 0 0 slow scene, I get this:

Warn: === tess.multiDrawPrimitives: 246
Warn: === tess.multiDrawPrimitives: 7
Warn: === tess.multiDrawPrimitives: 8
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 2
Warn: === tess.multiDrawPrimitives: 4
Warn: === tess.multiDrawPrimitives: 9
Warn: === tess.multiDrawPrimitives: 7
Warn: === tess.multiDrawPrimitives: 4
Warn: === tess.multiDrawPrimitives: 2
Warn: === tess.multiDrawPrimitives: 23
Warn: === tess.multiDrawPrimitives: 17
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 17
Warn: === tess.multiDrawPrimitives: 3
Warn: === tess.multiDrawPrimitives: 20
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 7
Warn: === tess.multiDrawPrimitives: 5
Warn: === tess.multiDrawPrimitives: 5
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 4
Warn: === tess.multiDrawPrimitives: 3
Warn: === tess.multiDrawPrimitives: 7
Warn: === tess.multiDrawPrimitives: 7
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 3
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 4
Warn: === tess.multiDrawPrimitives: 2
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 2
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 2
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 7
Warn: === tess.multiDrawPrimitives: 9
Warn: === tess.multiDrawPrimitives: 4
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 3
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 4
Warn: === tess.multiDrawPrimitives: 4
Warn: === tess.multiDrawPrimitives: 4
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
[again]

On the 667 -1792 176 -180 0 fast scene (same position in space but looking in opposite direction), it loops printing this:

Warn: === tess.multiDrawPrimitives: 62
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 16
Warn: === tess.multiDrawPrimitives: 3
Warn: === tess.multiDrawPrimitives: 4
Warn: === tess.multiDrawPrimitives: 3
Warn: === tess.multiDrawPrimitives: 3
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 4
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 7
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 2
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
Warn: === tess.multiDrawPrimitives: 1
[again]

@illwieckz
Copy link
Member Author

Just as a quick test, reducing MAX_MULTIDRAW_PRIMITIVES doesn't help.

Also the ioq3 and grangerhub's tremulous source codes have no reference to glMultiDrawElements. 🤔️

@illwieckz
Copy link
Member Author

At the same places we call Tess_DrawElements that calls glMultiDrawElements, they call R_DrawElements that calls glDrawElements.

@illwieckz
Copy link
Member Author

Also for curiosity, I did that, but I still get 5fps on that default Vega scene:

                if ( tess.multiDrawPrimitives )
                {
-                       glMultiDrawElements( GL_TRIANGLES, tess.multiDrawCounts, GL_INDEX_TYPE, ( const GLvoid ** ) tess.multiDrawIndexes, tess.multiDrawPrimitives );
+                       if ( false )
+                       {
+                               glMultiDrawElements( GL_TRIANGLES, tess.multiDrawCounts, GL_INDEX_TYPE, ( const GLvoid ** ) tess.multiDrawIndexes, tess.multiDrawPrimitives );
+                       }
+                       else
+                       {
+                               for ( i = 0; i < tess.multiDrawPrimitives; i++ )
+                               {
+                                       glDrawElements( GL_TRIANGLES, tess.multiDrawCounts[ i ], GL_INDEX_TYPE, tess.multiDrawIndexes[ i ] );
+                               }
+                       }

@illwieckz
Copy link
Member Author

And if I interrupt, I interrupt in the middle of an ioctl called by my own glDrawElements:

Thread 1 (Thread 0x74bd4686ca80 (LWP 519073) "daemon"):
#0  __GI___ioctl (fd=fd@entry=15, request=request@entry=3223348317) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x000074bd47ef00a0 in drmIoctl (fd=15, request=request@entry=3223348317, arg=arg@entry=0x7fff5a18a700) at ../xf86drm.c:704
#2  0x000074bd47ef3260 in drmCommandWriteRead (fd=<optimized out>, drmCommandIndex=drmCommandIndex@entry=29, data=data@entry=0x7fff5a18a700, size=size@entry=32) at ../xf86drm.c:3186
#3  0x000074bd23076013 in radeon_create_bo (rws=rws@entry=0x5f49b6c500e0, size=size@entry=4370432, alignment=alignment@entry=4096, initial_domains=initial_domains@entry=2, flags=flags@entry=16, heap=heap@entry=0) at ../src/gallium/winsys/radeon/drm/radeon_drm_bo.c:632
#4  0x000074bd230767c5 in radeon_winsys_bo_create (rws=0x5f49b6c500e0, size=4370432, alignment=<optimized out>, domain=<optimized out>, flags=<optimized out>) at ../src/gallium/winsys/radeon/drm/radeon_drm_bo.c:1043
#5  0x000074bd2303a5e4 in r300_buffer_create (screen=0x5f49b6c51050, templ=0x7fff5a18a830) at ../src/gallium/drivers/r300/r300_screen_buffer.c:191
#6  0x000074bd22f036a3 in u_upload_alloc_buffer (min_size=4366992, upload=0x5f49b6cfe660) at ../src/gallium/auxiliary/util/u_upload_mgr.c:208
#7  u_upload_alloc (upload=0x5f49b6cfe660, min_out_offset=<optimized out>, size=2816, alignment=alignment@entry=4, out_offset=out_offset@entry=0x7fff5a18a974, outbuf=outbuf@entry=0x7fff5a18a978, ptr=0x7fff5a18a980) at ../src/gallium/auxiliary/util/u_upload_mgr.c:273
#8  0x000074bd22f04ca3 in u_vbuf_translate_buffers (unroll_indices=false, min_index=<optimized out>, num_vertices=176, start_vertex=272761, out_vb=1, vb_mask=<optimized out>, draw=<optimized out>, info=<optimized out>, key=<optimized out>, mgr=0x5f49b6d937a0) at ../src/gallium/auxiliary/util/u_vbuf.c:587
#9  u_vbuf_translate_begin (mgr=mgr@entry=0x5f49b6d937a0, info=info@entry=0x7fff5a18b830, draw=draw@entry=0x7fff5a18b814, start_vertex=start_vertex@entry=272761, num_vertices=num_vertices@entry=176, min_index=<optimized out>, unroll_indices=<optimized out>, misaligned=<optimized out>) at ../src/gallium/auxiliary/util/u_vbuf.c:805
#10 0x000074bd22f07120 in u_vbuf_draw_vbo (pipe=0x5f49b6cad050, info=0x7fff5a18b9e0, drawid_offset=0, indirect=0x0, draws=<optimized out>, num_draws=1) at ../src/gallium/auxiliary/util/u_vbuf.c:1726
#11 0x000074bd22b2c18d in _mesa_validated_drawrangeelements (ctx=ctx@entry=0x5f49b6caf170, index_bo=<optimized out>, mode=mode@entry=4, index_bounds_valid=index_bounds_valid@entry=false, start=start@entry=0, end=end@entry=4294967295, count=276, type=5125, indices=0x2353c8, basevertex=0, numInstances=1, baseInstance=0) at ../src/mesa/main/draw.c:1734
#12 0x000074bd22b2d689 in _mesa_DrawElementsBaseVertex (mode=4, count=276, type=5125, indices=0x2353c8, basevertex=0) at ../src/mesa/main/draw.c:1877
#13 0x00005f49b54c70c2 in Tess_DrawElements () at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:337
#14 0x00005f49b54c8344 in Render_generic3D (pStage=0x74bd44f148c0) at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:802
#15 0x00005f49b54c83af in Render_generic (pStage=0x74bd44f148c0) at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:815
#16 0x00005f49b54cd658 in Tess_StageIteratorColor () at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:2612
#17 0x00005f49b54cdf4d in Tess_End () at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:2848
#18 0x00005f49b5424387 in RB_RenderDrawSurfaces (fromSort=shaderSort_t::SS_DEPTH, toSort=shaderSort_t::SS_DEPTH, drawSurfFilter=DRAWSURFACES_ALL) at Unvanquished/daemon/src/engine/renderer/tr_backend.cpp:903
#19 0x00005f49b5430f33 in RB_RenderView (depthPass=true) at Unvanquished/daemon/src/engine/renderer/tr_backend.cpp:4802
#20 0x00005f49b5433e0c in DrawViewCommand::ExecuteSelf (this=0x74bd2828fa74) at Unvanquished/daemon/src/engine/renderer/tr_backend.cpp:5710
#21 0x00005f49b54345e6 in RB_ExecuteRenderCommands (data=0x74bd2828f04c) at Unvanquished/daemon/src/engine/renderer/tr_backend.cpp:5961
#22 0x00005f49b5455f72 in R_IssueRenderCommands (runPerformanceCounters=true) at Unvanquished/daemon/src/engine/renderer/tr_cmds.cpp:183
#23 0x00005f49b545707d in RE_EndFrame (frontEndMsec=0x0, backEndMsec=0x0) at Unvanquished/daemon/src/engine/renderer/tr_cmds.cpp:889
#24 0x00005f49b5391ce0 in SCR_UpdateScreen () at Unvanquished/daemon/src/engine/client/cl_scrn.cpp:327
#25 0x00005f49b5381dc6 in CL_Frame (msec=188) at Unvanquished/daemon/src/engine/client/cl_main.cpp:2087
#26 0x00005f49b52a0870 in Com_Frame () at Unvanquished/daemon/src/engine/qcommon/common.cpp:958
#27 0x00005f49b5299259 in Application::ClientApplication::Frame (this=0x5f49b577ff40 <Application::GetApp()::app>) at Unvanquished/daemon/src/engine/client/ClientApplication.cpp:96
#28 0x00005f49b5589e73 in Application::Frame () at Unvanquished/daemon/src/engine/framework/Application.cpp:87
#29 0x00005f49b55cb4fb in main (argc=60, argv=0x7fff5a18c008) at Unvanquished/daemon/src/engine/framework/System.cpp:784

@illwieckz
Copy link
Member Author

illwieckz commented Jun 1, 2024

It looks to happen right after the autosprite deform in RB_RenderDrawSurfaces (skipping that autosprite deform doesn't help):

if ( oldShader != nullptr )
{
	if ( oldShader->autoSpriteMode && !(tess.attribsSet & ATTR_ORIENTATION) ) {
		Tess_AutospriteDeform( oldShader->autoSpriteMode,
				       0, tess.numVertexes,
				       0, tess.numIndexes );
	}
	Tess_End(); // <-- here
}

@illwieckz
Copy link
Member Author

So I did that:

diff --git a/src/engine/renderer/tr_shade.cpp b/src/engine/renderer/tr_shade.cpp
index c99eb8369..3063a5117 100644
--- a/src/engine/renderer/tr_shade.cpp
+++ b/src/engine/renderer/tr_shade.cpp
@@ -311,6 +311,8 @@ void GLSL_FinishGPUShaders()
 Tess_DrawElements
 ==================
 */
+#include <ctime>
+#include <cstdio>
 void Tess_DrawElements()
 {
 	int i;
@@ -325,7 +327,10 @@ void Tess_DrawElements()
 	{
 		if ( tess.multiDrawPrimitives )
 		{
+			clock_t start = clock();
 			glMultiDrawElements( GL_TRIANGLES, tess.multiDrawCounts, GL_INDEX_TYPE, ( const GLvoid ** ) tess.multiDrawIndexes, tess.multiDrawPrimitives );
+			clock_t diff = clock() - start;
+			printf("=== time: %ld, primitives: %d\n", diff, tess.multiDrawPrimitives);
 
 			backEnd.pc.c_multiDrawElements++;
 			backEnd.pc.c_multiDrawPrimitives += tess.multiDrawPrimitives;

I get this:

=== time: 108147, primitives: 243
=== time: 1714, primitives: 1
=== time: 102, primitives: 2
=== time: 662, primitives: 8
=== time: 79, primitives: 3
=== time: 39, primitives: 6
=== time: 70, primitives: 2
=== time: 767, primitives: 40
=== time: 5316, primitives: 10
=== time: 11874, primitives: 10
=== time: 10550, primitives: 21
=== time: 7498, primitives: 4
=== time: 1557, primitives: 1
=== time: 1463, primitives: 1
=== time: 9861, primitives: 12
=== time: 177, primitives: 7
=== time: 43, primitives: 1
=== time: 41, primitives: 3
=== time: 112, primitives: 4
=== time: 39, primitives: 1
=== time: 176, primitives: 14
=== time: 96, primitives: 3
=== time: 122, primitives: 1
=== time: 2139, primitives: 2
=== time: 3501, primitives: 4
=== time: 1959, primitives: 7
=== time: 1757, primitives: 2
=== time: 3340, primitives: 2
=== time: 1979, primitives: 2
=== time: 42, primitives: 1
=== time: 26, primitives: 1
=== time: 39, primitives: 1
=== time: 21, primitives: 1
=== time: 2012, primitives: 1
=== time: 50, primitives: 1
=== time: 1914, primitives: 1
=== time: 69, primitives: 1
=== time: 39, primitives: 1
=== time: 30, primitives: 1
=== time: 1677, primitives: 3
=== time: 94, primitives: 1
=== time: 36, primitives: 7
=== time: 3622, primitives: 7
=== time: 100, primitives: 7
=== time: 60, primitives: 1

@illwieckz
Copy link
Member Author

With:

diff --git a/src/engine/renderer/tr_shade.cpp b/src/engine/renderer/tr_shade.cpp
index c99eb8369..01aac402c 100644
--- a/src/engine/renderer/tr_shade.cpp
+++ b/src/engine/renderer/tr_shade.cpp
@@ -311,6 +311,8 @@ void GLSL_FinishGPUShaders()
 Tess_DrawElements
 ==================
 */
+#include <ctime>
+#include <cstdio>
 void Tess_DrawElements()
 {
 	int i;
@@ -325,7 +327,10 @@ void Tess_DrawElements()
 	{
 		if ( tess.multiDrawPrimitives )
 		{
+			clock_t start = clock();
 			glMultiDrawElements( GL_TRIANGLES, tess.multiDrawCounts, GL_INDEX_TYPE, ( const GLvoid ** ) tess.multiDrawIndexes, tess.multiDrawPrimitives );
+			clock_t diff = clock() - start;
+			printf("=== time: %ld, primitives: %d, time/primitives: %f\n", diff, tess.multiDrawPrimitives, (float) diff/tess.multiDrawPrimitives);
 
 			backEnd.pc.c_multiDrawElements++;
 			backEnd.pc.c_multiDrawPrimitives += tess.multiDrawPrimitives;

I get:

=== time: 108013, primitives: 243, time/primitives: 444.497955
=== time: 1716, primitives: 1, time/primitives: 1716.000000
=== time: 81, primitives: 2, time/primitives: 40.500000
=== time: 667, primitives: 8, time/primitives: 83.375000
=== time: 46, primitives: 3, time/primitives: 15.333333
=== time: 39, primitives: 6, time/primitives: 6.500000
=== time: 68, primitives: 2, time/primitives: 34.000000
=== time: 730, primitives: 40, time/primitives: 18.250000
=== time: 5799, primitives: 10, time/primitives: 579.900024
=== time: 10355, primitives: 10, time/primitives: 1035.500000
=== time: 12884, primitives: 21, time/primitives: 613.523804
=== time: 4710, primitives: 4, time/primitives: 1177.500000
=== time: 1542, primitives: 1, time/primitives: 1542.000000
=== time: 1443, primitives: 1, time/primitives: 1443.000000
=== time: 11180, primitives: 12, time/primitives: 931.666687
=== time: 187, primitives: 7, time/primitives: 26.714285
=== time: 34, primitives: 1, time/primitives: 34.000000
=== time: 37, primitives: 3, time/primitives: 12.333333
=== time: 69, primitives: 4, time/primitives: 17.250000
=== time: 33, primitives: 1, time/primitives: 33.000000
=== time: 141, primitives: 14, time/primitives: 10.071428
=== time: 79, primitives: 3, time/primitives: 26.333334
=== time: 109, primitives: 1, time/primitives: 109.000000
=== time: 1615, primitives: 2, time/primitives: 807.500000
=== time: 1671, primitives: 4, time/primitives: 417.750000
=== time: 2049, primitives: 7, time/primitives: 292.714294
=== time: 1658, primitives: 2, time/primitives: 829.000000
=== time: 3267, primitives: 2, time/primitives: 1633.500000
=== time: 3889, primitives: 2, time/primitives: 1944.500000
=== time: 34, primitives: 1, time/primitives: 34.000000
=== time: 12, primitives: 1, time/primitives: 12.000000
=== time: 44, primitives: 1, time/primitives: 44.000000
=== time: 17, primitives: 1, time/primitives: 17.000000
=== time: 2041, primitives: 1, time/primitives: 2041.000000
=== time: 44, primitives: 1, time/primitives: 44.000000
=== time: 1917, primitives: 1, time/primitives: 1917.000000
=== time: 64, primitives: 1, time/primitives: 64.000000
=== time: 74, primitives: 1, time/primitives: 74.000000
=== time: 30, primitives: 1, time/primitives: 30.000000
=== time: 1322, primitives: 3, time/primitives: 440.666656
=== time: 67, primitives: 1, time/primitives: 67.000000
=== time: 28, primitives: 7, time/primitives: 4.000000
=== time: 1740, primitives: 7, time/primitives: 248.571426
=== time: 91, primitives: 7, time/primitives: 13.000000
=== time: 48, primitives: 1, time/primitives: 48.000000

Some of them are really slow to render, 1 or 2ms for a single thing…

@illwieckz
Copy link
Member Author

illwieckz commented Jun 2, 2024

If I take a simple scene that achieves 47fps on the same Vega map (358 -1792 170 -180 0) , I get this:

=== time: 545, primitives: 28, time/primitives: 19.464285
=== time: 232, primitives: 9, time/primitives: 25.777779
=== time: 50, primitives: 1, time/primitives: 50.000000
=== time: 33, primitives: 2, time/primitives: 16.500000
=== time: 23, primitives: 2, time/primitives: 11.500000
=== time: 27, primitives: 2, time/primitives: 13.500000
=== time: 1516, primitives: 1, time/primitives: 1516.000000
=== time: 1322, primitives: 5, time/primitives: 264.399994
=== time: 1683, primitives: 2, time/primitives: 841.500000
=== time: 1925, primitives: 1, time/primitives: 1925.000000
=== time: 26, primitives: 1, time/primitives: 26.000000
=== time: 50, primitives: 1, time/primitives: 50.000000
=== time: 18, primitives: 1, time/primitives: 18.000000
=== time: 1613, primitives: 1, time/primitives: 1613.000000
=== time: 53, primitives: 1, time/primitives: 53.000000
=== time: 43, primitives: 1, time/primitives: 43.000000

There is much less things to do, but still things lasting more than 1ms and even close to 2ms.

So it looks like we have a very slow stuff somewhere, and having more tris just gives more chance to get those slow stuff to render. If we find them and fasten them, we may not only fix the huge performance drop but also the general performance.

@illwieckz
Copy link
Member Author

Now I need help to identify what are those things so slow to render…

@illwieckz
Copy link
Member Author

So I did this:

diff --git a/src/engine/renderer/tr_shade.cpp b/src/engine/renderer/tr_shade.cpp
index c99eb8369..8b4a46f45 100644
--- a/src/engine/renderer/tr_shade.cpp
+++ b/src/engine/renderer/tr_shade.cpp
@@ -793,6 +793,8 @@ void Render_generic3D( shaderStage_t *pStage )
 	GL_CheckErrors();
 }
 
+#include <ctime>
+#include <cstdio>
 void Render_generic( shaderStage_t *pStage )
 {
 	if ( backEnd.projection2D )
@@ -801,7 +803,10 @@ void Render_generic( shaderStage_t *pStage )
 		return;
 	}
 
+	clock_t start = clock();
 	Render_generic3D( pStage );
+	clock_t diff = clock() - start;
+	printf("=== time: %ld, %s\n", diff, tess.surfaceShader->name);
 }
 
 /*

And I got this:

=== time: 106604, <default>$depth
=== time: 1666, models/mapobjects/vega/globe/globe$depth
=== time: 123, models/mapobjects/vega/globe/globe
=== time: 58, models/mapobjects/vega/monitor/display_door_red
=== time: 14, models/mapobjects/vega/monitor/display_door_blue
=== time: 2043, textures/vega_custom/starchart02
=== time: 33, textures/vega_custom/wheeltracks
=== time: 86, textures/vega_custom/holo_projector
=== time: 34, textures/shared_vega/glass01
=== time: 1764, textures/shared_vega/glass01

@illwieckz
Copy link
Member Author

illwieckz commented Jun 2, 2024

I don't know why the $depth things are so slow, at least we notice that the globe has one, but the textures/vega_custom/starchart02 looks totally stupidly slow, it's the simplest material we can have:

textures/vega_custom/starchart02
{
	qer_editorImage textures/vega_custom_src/starchart02_d

	q3map_surfacelight 150

	surfaceparm nolightmap

	{
		map textures/vega_custom_src/starchart02_d
	}

}

Spending 2ms on rendering this is very wrong.

And for the glass01 that also looks stupidly slow, it's a bit more complex but not crazily complex:

textures/shared_vega/glass01
{
	qer_editorImage textures/shared_vega_src/glass01_b
	qer_trans .7

	cull none

	{
		map textures/shared_vega_src/glass01_env
		blendFunc gl_dst_color gl_one
		tcGen environment
	}

	{
		map textures/shared_vega_src/glass01_b
		blend filter
	}

	{
		map $lightmap
		blendFunc gl_dst_color gl_one
	}
}

Also, why one glass01 stage is fast, and one is slow?

And since the lightmap is rendered by Render_lightmap and not Render_generic3D, it looks like the slow glass01 stage is the second one, so this one:

	{
		map textures/shared_vega_src/glass01_b
		blend filter
	}

It doesn't make sense at all that this stage can take almost 1.7ms to render!

The simple fact everytime I interrupt I stop on a drmIoctl called by Render_generic3D looks stupid, this Render_generic3D function is for the simplest surfaces we have! Something looks to be badly wrong somewhere.

@illwieckz
Copy link
Member Author

Knowing the starchart02 shader could also be rendered by the Render_lightMapping (it's just supposedly more complex than Render_generic3D), I did that:

void Render_generic( shaderStage_t *pStage )
{
	if ( backEnd.projection2D )
	{
		Render_generic2D( pStage );
		return;
	}

+	if ( tess.surfaceShader->sort != Util::ordinal( shaderSort_t::SS_DEPTH ) )
+	{
+		Render_lightMapping( pStage );
+	}

	clock_t start = clock();
	Render_generic3D( pStage );
	clock_t diff = clock() - start;
	printf("=== time: %ld, %s\n", diff, tess.surfaceShader->name);
}

Of course it broke some other textures that cannot be rendered by Render_lightMapping, but I made sure it keeps the depth stage being rendered the usual way so outside of starchart02 I basically get some other textures being wrong but that's all.

And for starchart02, I got his time:

=== time: 72, textures/vega_custom/starchart02

And that texture looks to be prolerly rendered! The time to render it went from 2ms to 72ns! Something is definitely wrong in Render_generic3D, I may have found something big!

@slipher
Copy link
Member

slipher commented Jun 2, 2024

Is there some reason to believe that Render_generic3D is wrong, as opposed to the driver being crap?

@illwieckz
Copy link
Member Author

illwieckz commented Jun 2, 2024

The arguments in favor of the bug being in our code or at least triggered by our code:

  • Using Render_lightMapping from our own code for the same texture is 28 times faster.
  • ioquake3 renderer is 5 time faster to render what is believed be the same scene (in fact, probably more, as the renderer configuration was default-but-vertex-lighting on ioquake3 renderer side: higher texture size, higher LOD…).

This doesn't exclude the possibility the driver has an nasty bug that can be triggered by doing something as right as Renderer_lightMapping, but in a different way.

The fact Render_lightMapping renders the same material properly and faster means we can run a code that does the same but faster, helping to identify what is slower in the Renderer_generic3D, either finding something wrong in our code, either finding something that triggers the possible driver bug.

The argument in favor of the bug being in driver code:

  • Other drivers don't seem to suffer from this (or not as much), including the nouveau driver for a low end card of the same era.

For information why the starchart02 is rendered by Renderer_generic3D instead of Renderer_lightMapping in our renderer is that not only Renderer_lightMapping would run extra code for nothing like blending a fullwhite lightmap to produce the same result, but Render_lightMapping would incorrectly blend dynamic lights on this texture but the material has the nolightmap keyword, which means no light (including dynamic lights) should be blended over.

There is no reason Render_generic3D to be 28 times slower than Renderer_lightMapping while it is expected to even use a smaller GLSL code. So either we have a bug in our code, either something in our code triggers a driver bug. Identifying what triggers this would help to either fix the bug in our code, or report the bug to driver developers, or workaround the bug.

@illwieckz
Copy link
Member Author

illwieckz commented Jun 2, 2024

So I managed to record some run with apitrace, an painfully iterated the calls until I visually recognize the starchart02 texture, and copy-pasted the fragment and vertex shaders for that texture for both generic3D and lightMapping shader variants (I tweaked tr_shader.cpp to render twice that texture with one and the other). Then I did:

for i in *.glsl
do
	cpp $i | sed -e 's/^#.*//' | grep -Ev '^$' > pp.$i
done

Si I now have some preprocessed GLSL code for both cases.

Then I edited a bit the spacing in vp GLSL shaders to reduce the diff noise. I have not modified the fp GLSL shaders because they are too much different and no spacing change can help.

Here is preprocessed generic3D_fp.glsl:

float smoothstep(float edge0, float edge1, float x) { float t = clamp((x - edge0) / (edge1 - edge0), 0.0, 1.0); return t * t * (3.0 - 2.0 * t); }
const int MAX_GLSL_BONES = 41;
uniform sampler2D u_ColorMap;
uniform float u_AlphaThreshold;
uniform float u_InverseLightFactor;
varying vec2 var_TexCoords;
varying vec4 var_Color;
void main()
{
 vec4 color = texture2D(u_ColorMap, var_TexCoords);
 color *= var_Color;
 color.rgb *= u_InverseLightFactor;
 gl_FragColor = color;
}

Here is preprocessed lightMapping_fp.glsl:

float smoothstep(float edge0, float edge1, float x) { float t = clamp((x - edge0) / (edge1 - edge0), 0.0, 1.0); return t * t * (3.0 - 2.0 * t); }
const int MAX_GLSL_BONES = 41;
uniform sampler2D u_Lights;
const struct GetLightOffsets {
  int center_radius;
  int color_type;
  int direction_angle;
} getLightOffsets = GetLightOffsets(0, 1, 2);
uniform int u_numLights;
uniform vec2 u_SpecularExponent;
void ReadLightGrid(in vec4 texel, out vec3 ambientColor, out vec3 lightColor) {
 float ambientScale = 2.0 * texel.a;
 float directedScale = 2.0 - ambientScale;
 ambientColor = ambientScale * texel.rgb;
 lightColor = directedScale * texel.rgb;
}
void computeLight(in vec3 lightColor, vec4 diffuseColor, inout vec4 color) {
 color.rgb += lightColor.rgb * diffuseColor.rgb;
}
void computeDeluxeLight( vec3 lightDir, vec3 normal, vec3 viewDir, vec3 lightColor,
     vec4 diffuseColor, vec4 materialColor,
     inout vec4 color ) {
  vec3 H = normalize( lightDir + viewDir );
  float NdotL = dot( normal, lightDir );
  NdotL = clamp( NdotL, 0.0, 1.0 );
  color.rgb += lightColor.rgb * NdotL * diffuseColor.rgb;
}
const int lightsPerLayer = 4;
uniform sampler3D u_LightTiles;
vec4 fetchIdxs( in vec3 coords ) {
  return texture3D( u_LightTiles, coords ) * 255.0;
}
int nextIdx( inout vec4 idxs ) {
  vec4 tmp = idxs;
  idxs = floor(idxs * 0.25);
  tmp -= 4.0 * idxs;
  return int( dot( tmp, vec4( 64.0, 16.0, 4.0, 1.0 ) ) );
}
const int numLayers = 1024 / 256;
vec3 NormalInTangentSpace(vec2 texNormal)
{
 vec3 normal;
 normal = vec3(0.0, 0.0, 1.0);
 return normal;
}
vec3 NormalInWorldSpace(vec2 texNormal, mat3 tangentToWorldMatrix)
{
 vec3 normal = NormalInTangentSpace(texNormal);
 return normalize(tangentToWorldMatrix * normal);
}
uniform sampler2D u_DiffuseMap;
uniform sampler2D u_MaterialMap;
uniform sampler2D u_GlowMap;
uniform float u_AlphaThreshold;
uniform float u_InverseLightFactor;
uniform vec3 u_ViewOrigin;
varying vec3 var_Position;
varying vec2 var_TexCoords;
varying vec4 var_Color;
varying vec3 var_Tangent;
varying vec3 var_Binormal;
varying vec3 var_Normal;
 uniform sampler2D u_LightMap;
 varying vec2 var_TexLight;
void main()
{
 vec3 viewDir = normalize(u_ViewOrigin - var_Position);
 vec2 texCoords = var_TexCoords;
 mat3 tangentToWorldMatrix = mat3(var_Tangent.xyz, var_Binormal.xyz, var_Normal.xyz);
 vec4 diffuse = texture2D(u_DiffuseMap, texCoords);
 diffuse *= var_Color;
 if(abs(diffuse.a + u_AlphaThreshold) <= 1.0)
 {
  discard;
  return;
 }
 vec3 normal = NormalInWorldSpace(texCoords, tangentToWorldMatrix);
 vec4 material = texture2D(u_MaterialMap, texCoords);
 vec4 color;
 color.a = diffuse.a;
  vec3 lightColor = texture2D(u_LightMap, var_TexLight).rgb;
  color.rgb = vec3(0.0);
  computeLight(lightColor, diffuse, color);
 if ( u_InverseLightFactor > 0 )
 {
  color.rgb *= u_InverseLightFactor;
 }
  vec3 glow = texture2D(u_GlowMap, texCoords).rgb;
  if ( u_InverseLightFactor < 0 )
  {
   glow *= - u_InverseLightFactor;
  }
  color.rgb += glow;
 gl_FragColor = color;
}

Here is preprocessed generic3D_vp.glsl:

float waveSin(float x) {
 return sin( radians( 360.0 * x ) );
}
float waveSquare(float x) {
 return sign( waveSin( x ) );
}
float waveTriangle(float x)
{
 return 1.0 - abs( 4.0 * fract( x + 0.25 ) - 2.0 );
}
float waveSawtooth(float x)
{
 return fract( x );
}
void DeformVertex( inout vec4 pos,
     inout vec3 normal,
     inout vec2 st,
     inout vec4 color,
     in float time)
{
 vec4 work = vec4(0.0);
}
float smoothstep(float edge0, float edge1, float x) { float t = clamp((x - edge0) / (edge1 - edge0), 0.0, 1.0); return t * t * (3.0 - 2.0 * t); }
const int MAX_GLSL_BONES = 41;
struct localBasis {
 vec3 normal;
 vec3 tangent, binormal;
};
vec3 QuatTransVec(in vec4 quat, in vec3 vec) {
 vec3 tmp = 2.0 * cross( quat.xyz, vec );
 return vec + quat.w * tmp + cross( quat.xyz, tmp );
}
void QTangentToLocalBasis( in vec4 qtangent, out localBasis LB ) {
 LB.normal = QuatTransVec( qtangent, vec3( 0.0, 0.0, 1.0 ) );
 LB.tangent = QuatTransVec( qtangent, vec3( 1.0, 0.0, 0.0 ) );
 LB.tangent *= sign( qtangent.w );
 LB.binormal = QuatTransVec( qtangent, vec3( 0.0, 1.0, 0.0 ) );
}
attribute vec3 attr_Position;
attribute vec4 attr_Color;
attribute vec4 attr_QTangent;
attribute vec4 attr_TexCoord0;
void VertexFetch(out vec4 position,
   out localBasis normalBasis,
   out vec4 color,
   out vec2 texCoord,
   out vec2 lmCoord)
{
 position = vec4( attr_Position, 1.0 );
 QTangentToLocalBasis( attr_QTangent, normalBasis );
 color = attr_Color;
 texCoord = attr_TexCoord0.xy;
 lmCoord = attr_TexCoord0.zw;
}
uniform mat4 u_TextureMatrix;
uniform vec3 u_ViewOrigin;
uniform mat4 u_ModelViewProjectionMatrix;
uniform float u_Time;
uniform vec4 u_ColorModulate;
uniform vec4 u_Color;
varying vec2 var_TexCoords;
varying vec4 var_Color;
void DeformVertex(inout vec4 pos, inout vec3 normal, inout vec2 st, inout vec4 color, in float time);
void main()
{
 localBasis LB;
 vec4 position, color;
 vec2 texCoord, lmCoord;
 VertexFetch(position, LB, color, texCoord, lmCoord);
 color = color * u_ColorModulate + u_Color;
 DeformVertex(position, LB.normal, texCoord, color, u_Time);
 gl_Position = u_ModelViewProjectionMatrix * position;
 var_TexCoords = (u_TextureMatrix * vec4(texCoord, 0.0, 1.0)).xy;
 var_Color = color;
}}

Here is preprocessed lightMapping_vp.glsl:

float waveSin(float x) {
 return sin( radians( 360.0 * x ) );
}
float waveSquare(float x) {
 return sign( waveSin( x ) );
}
float waveTriangle(float x)
{
 return 1.0 - abs( 4.0 * fract( x + 0.25 ) - 2.0 );
}
float waveSawtooth(float x)
{
 return fract( x );
}
void DeformVertex( inout vec4 pos,
     inout vec3 normal,
     inout vec2 st,
     inout vec4 color,
     in float time)
{
 vec4 work = vec4(0.0);
}
float smoothstep(float edge0, float edge1, float x) { float t = clamp((x - edge0) / (edge1 - edge0), 0.0, 1.0); return t * t * (3.0 - 2.0 * t); }
const int MAX_GLSL_BONES = 41;
struct localBasis {
 vec3 normal;
 vec3 tangent, binormal;
};
vec3 QuatTransVec(in vec4 quat, in vec3 vec) {
 vec3 tmp = 2.0 * cross( quat.xyz, vec );
 return vec + quat.w * tmp + cross( quat.xyz, tmp );
}
void QTangentToLocalBasis( in vec4 qtangent, out localBasis LB ) {
 LB.normal = QuatTransVec( qtangent, vec3( 0.0, 0.0, 1.0 ) );
 LB.tangent = QuatTransVec( qtangent, vec3( 1.0, 0.0, 0.0 ) );
 LB.tangent *= sign( qtangent.w );
 LB.binormal = QuatTransVec( qtangent, vec3( 0.0, 1.0, 0.0 ) );
}
attribute vec3 attr_Position;
attribute vec4 attr_Color;
attribute vec4 attr_QTangent;
attribute vec4 attr_TexCoord0;
void VertexFetch(out vec4 position,
   out localBasis normalBasis,
   out vec4 color,
   out vec2 texCoord,
   out vec2 lmCoord)
{
 position = vec4( attr_Position, 1.0 );
 QTangentToLocalBasis( attr_QTangent, normalBasis );
 color = attr_Color;
 texCoord = attr_TexCoord0.xy;
 lmCoord = attr_TexCoord0.zw;
}
uniform mat4 u_TextureMatrix;
uniform mat4 u_ModelViewProjectionMatrix;
uniform float u_Time;
uniform vec4 u_ColorModulate;
uniform vec4 u_Color;
varying vec3 var_Position;
varying vec2 var_TexCoords;
varying vec2 var_TexLight;
varying vec3 var_Tangent;
varying vec3 var_Binormal;
varying vec3 var_Normal;
varying vec4 var_Color;
void DeformVertex(inout vec4 pos, inout vec3 normal, inout vec2 st, inout vec4 color, in float time);
void main()
{
 localBasis LB;
 vec4 position, color;
 vec2 texCoord, lmCoord;
 VertexFetch(position, LB, color, texCoord, lmCoord);
 color = color * u_ColorModulate + u_Color;
 DeformVertex(position, LB.normal, texCoord, color, u_Time);
 gl_Position = u_ModelViewProjectionMatrix * position;
  var_Position = position.xyz;
  var_Tangent = LB.tangent;
  var_Binormal = LB.binormal;
  var_Normal = LB.normal;
  var_TexLight = lmCoord;
 var_TexCoords = (u_TextureMatrix * vec4(texCoord, 0.0, 1.0)).xy;
 var_Color = color;
}

@illwieckz
Copy link
Member Author

So the diffs:

--- pp.generic3D_fp.glsl
+++ pp.lightMapping_fp.glsl
@@ -1,14 +1,95 @@
 float smoothstep(float edge0, float edge1, float x) { float t = clamp((x - edge0) / (edge1 - edge0), 0.0, 1.0); return t * t * (3.0 - 2.0 * t); }
 const int MAX_GLSL_BONES = 41;
-uniform sampler2D u_ColorMap;
+uniform sampler2D u_Lights;
+const struct GetLightOffsets {
+  int center_radius;
+  int color_type;
+  int direction_angle;
+} getLightOffsets = GetLightOffsets(0, 1, 2);
+uniform int u_numLights;
+uniform vec2 u_SpecularExponent;
+void ReadLightGrid(in vec4 texel, out vec3 ambientColor, out vec3 lightColor) {
+ float ambientScale = 2.0 * texel.a;
+ float directedScale = 2.0 - ambientScale;
+ ambientColor = ambientScale * texel.rgb;
+ lightColor = directedScale * texel.rgb;
+}
+void computeLight(in vec3 lightColor, vec4 diffuseColor, inout vec4 color) {
+ color.rgb += lightColor.rgb * diffuseColor.rgb;
+}
+void computeDeluxeLight( vec3 lightDir, vec3 normal, vec3 viewDir, vec3 lightColor,
+     vec4 diffuseColor, vec4 materialColor,
+     inout vec4 color ) {
+  vec3 H = normalize( lightDir + viewDir );
+  float NdotL = dot( normal, lightDir );
+  NdotL = clamp( NdotL, 0.0, 1.0 );
+  color.rgb += lightColor.rgb * NdotL * diffuseColor.rgb;
+}
+const int lightsPerLayer = 4;
+uniform sampler3D u_LightTiles;
+vec4 fetchIdxs( in vec3 coords ) {
+  return texture3D( u_LightTiles, coords ) * 255.0;
+}
+int nextIdx( inout vec4 idxs ) {
+  vec4 tmp = idxs;
+  idxs = floor(idxs * 0.25);
+  tmp -= 4.0 * idxs;
+  return int( dot( tmp, vec4( 64.0, 16.0, 4.0, 1.0 ) ) );
+}
+const int numLayers = 1024 / 256;
+vec3 NormalInTangentSpace(vec2 texNormal)
+{
+ vec3 normal;
+ normal = vec3(0.0, 0.0, 1.0);
+ return normal;
+}
+vec3 NormalInWorldSpace(vec2 texNormal, mat3 tangentToWorldMatrix)
+{
+ vec3 normal = NormalInTangentSpace(texNormal);
+ return normalize(tangentToWorldMatrix * normal);
+}
+uniform sampler2D u_DiffuseMap;
+uniform sampler2D u_MaterialMap;
+uniform sampler2D u_GlowMap;
 uniform float u_AlphaThreshold;
 uniform float u_InverseLightFactor;
+uniform vec3 u_ViewOrigin;
+varying vec3 var_Position;
 varying vec2 var_TexCoords;
 varying vec4 var_Color;
+varying vec3 var_Tangent;
+varying vec3 var_Binormal;
+varying vec3 var_Normal;
+ uniform sampler2D u_LightMap;
+ varying vec2 var_TexLight;
 void main()
 {
- vec4 color = texture2D(u_ColorMap, var_TexCoords);
- color *= var_Color;
- color.rgb *= u_InverseLightFactor;
+ vec3 viewDir = normalize(u_ViewOrigin - var_Position);
+ vec2 texCoords = var_TexCoords;
+ mat3 tangentToWorldMatrix = mat3(var_Tangent.xyz, var_Binormal.xyz, var_Normal.xyz);
+ vec4 diffuse = texture2D(u_DiffuseMap, texCoords);
+ diffuse *= var_Color;
+ if(abs(diffuse.a + u_AlphaThreshold) <= 1.0)
+ {
+  discard;
+  return;
+ }
+ vec3 normal = NormalInWorldSpace(texCoords, tangentToWorldMatrix);
+ vec4 material = texture2D(u_MaterialMap, texCoords);
+ vec4 color;
+ color.a = diffuse.a;
+  vec3 lightColor = texture2D(u_LightMap, var_TexLight).rgb;
+  color.rgb = vec3(0.0);
+  computeLight(lightColor, diffuse, color);
+ if ( u_InverseLightFactor > 0 )
+ {
+  color.rgb *= u_InverseLightFactor;
+ }
+  vec3 glow = texture2D(u_GlowMap, texCoords).rgb;
+  if ( u_InverseLightFactor < 0 )
+  {
+   glow *= - u_InverseLightFactor;
+  }
+  color.rgb += glow;
  gl_FragColor = color;
 }
--- pps.generic3D_vp.glsl
+++ pps.lightMapping_vp.glsl
@@ -1,75 +1,84 @@
 float waveSin(float x) {
  return sin( radians( 360.0 * x ) );
 }
 float waveSquare(float x) {
  return sign( waveSin( x ) );
 }
 float waveTriangle(float x)
 {
  return 1.0 - abs( 4.0 * fract( x + 0.25 ) - 2.0 );
 }
 float waveSawtooth(float x)
 {
  return fract( x );
 }
 void DeformVertex( inout vec4 pos,
      inout vec3 normal,
      inout vec2 st,
      inout vec4 color,
      in float time)
 {
  vec4 work = vec4(0.0);
 }
 float smoothstep(float edge0, float edge1, float x) { float t = clamp((x - edge0) / (edge1 - edge0), 0.0, 1.0); return t * t * (3.0 - 2.0 * t); }
 const int MAX_GLSL_BONES = 41;
 struct localBasis {
  vec3 normal;
  vec3 tangent, binormal;
 };
 vec3 QuatTransVec(in vec4 quat, in vec3 vec) {
  vec3 tmp = 2.0 * cross( quat.xyz, vec );
  return vec + quat.w * tmp + cross( quat.xyz, tmp );
 }
 void QTangentToLocalBasis( in vec4 qtangent, out localBasis LB ) {
  LB.normal = QuatTransVec( qtangent, vec3( 0.0, 0.0, 1.0 ) );
  LB.tangent = QuatTransVec( qtangent, vec3( 1.0, 0.0, 0.0 ) );
  LB.tangent *= sign( qtangent.w );
  LB.binormal = QuatTransVec( qtangent, vec3( 0.0, 1.0, 0.0 ) );
 }
 attribute vec3 attr_Position;
 attribute vec4 attr_Color;
 attribute vec4 attr_QTangent;
 attribute vec4 attr_TexCoord0;
 void VertexFetch(out vec4 position,
    out localBasis normalBasis,
    out vec4 color,
    out vec2 texCoord,
    out vec2 lmCoord)
 {
  position = vec4( attr_Position, 1.0 );
  QTangentToLocalBasis( attr_QTangent, normalBasis );
  color = attr_Color;
  texCoord = attr_TexCoord0.xy;
  lmCoord = attr_TexCoord0.zw;
 }
 uniform mat4 u_TextureMatrix;
-uniform vec3 u_ViewOrigin;
 uniform mat4 u_ModelViewProjectionMatrix;
 uniform float u_Time;
 uniform vec4 u_ColorModulate;
 uniform vec4 u_Color;
+varying vec3 var_Position;
 varying vec2 var_TexCoords;
+varying vec2 var_TexLight;
+varying vec3 var_Tangent;
+varying vec3 var_Binormal;
+varying vec3 var_Normal;
 varying vec4 var_Color;
 void DeformVertex(inout vec4 pos, inout vec3 normal, inout vec2 st, inout vec4 color, in float time);
 void main()
 {
  localBasis LB;
  vec4 position, color;
  vec2 texCoord, lmCoord;
  VertexFetch(position, LB, color, texCoord, lmCoord);
  color = color * u_ColorModulate + u_Color;
  DeformVertex(position, LB.normal, texCoord, color, u_Time);
  gl_Position = u_ModelViewProjectionMatrix * position;
+  var_Position = position.xyz;
+  var_Tangent = LB.tangent;
+  var_Binormal = LB.binormal;
+  var_Normal = LB.normal;
+  var_TexLight = lmCoord;
  var_TexCoords = (u_TextureMatrix * vec4(texCoord, 0.0, 1.0)).xy;
  var_Color = color;
 }

So it looks like there is no real difference between the generic3D and lightMapping GLSL code, the second one just computes more things.

If something is slower in generic3D, then it must be elsewhere.

@illwieckz
Copy link
Member Author

illwieckz commented Jun 3, 2024

This doesn't make sense:

=== time: 101197, <default>$depth:0
=== time: 1675, models/mapobjects/vega/globe/globe$depth:0
=== time: 3539, textures/vega_custom/starchart02:0

The single first line means only 10fps is left after only rendering <default>$depth. Then after rendering other things and processing the whole game, it's not surprising it's running at 5fps…

@illwieckz
Copy link
Member Author

illwieckz commented Jun 3, 2024

I wondered if the issue came from the OpenGL driver switching GLSL permutations, so I did that:

// choose right shader program ----------------------------------
gl_genericShader->SetVertexSkinning( 0 ); //glConfig2.vboVertexSkinningAvailable && tess.vboVertexSkinning );
gl_genericShader->SetVertexAnimation( 0 ); //tess.vboVertexAnimation );
gl_genericShader->SetTCGenEnvironment( 0 ); //pStage->tcGen_Environment );
gl_genericShader->SetTCGenLightmap( 0 ); //pStage->tcGen_Lightmap );
gl_genericShader->SetDepthFade( 0 ); //hasDepthFade );
gl_genericShader->SetVertexSprite( 0 ); //tess.vboVertexSprite );
gl_genericShader->BindProgram( 0 ); //pStage->deformIndex );
// end choose right shader program ------------------------------

It doesn't bring any performance bump on rendering the default spectator Vega scene (472 -1792 176 -180 0), while it bumps the performance runs from 30 to 60 fps of the rendering of a birdview of the whole Plat23 map (0 3568 3194 -90 66) and those 60fps are even kept when r_novis and r_nocull are enabled… So yes the shader permutations seems to be costly, but that's probably not the cause of the performance loss on Vega.

@illwieckz
Copy link
Member Author

illwieckz commented Jun 3, 2024

But I can render the whole plat23 map (0 3568 3194 -90 66) with all tris visible, and keep a smooth 30fps:

For comparing things like that it'd be helpful to know how many triangles are in each case in particular (some value of r_speeds shows that I believe).

The following test was running the previously quoted change to avoid switching between shader permutations. This boosted the Plat23 bird view and doubled its framerate, but changed nothing for Vega. I enabled r_speeds to get the amount of stuff processed. It looks like event the Plat23 birdview scene has less tris than the default Vega spectator scene, but then disabling vis and culling on that Plat23 birdview scene doesn't slow down the engine despite processing more tris than what the default Vega scene does with vis and culling enabled.

Plat23 bird view, 0 3568 3194 -90 66, 60fps:

unvanquished_2024-06-03_172749_000

  • cg_draw2D 1, r_novis 0, r_nocull 0
1 views 0 portals 98 batches 4288 surfs 661 leafs 4484 verts 35494 tris
0 lights 0 bout 0 pvsout 0 interactions
50 draws 3 vbos 3 ibos 4484 verts 2242 tris
61 multidraws 1718 primitives 33252 tris
  • cg_draw2D 1, r_novis 1, r_nocull 1
1 views 0 portals 99 batches 4880 surfs 661 leafs 4284 verts 36594 tris
0 lights 0 bout 0 pvsout 0 interactions
50 draws 3 vbos 3 ibos 4284 verts 2142 tris
62 multidraws 1883 primitives 34452 tris
  • cg_draw2D 0, r_novis 0, r_nocull 0
1 views 0 portals 33 batches 1376 surfs 191 leafs 0 verts 12516 tris
0 lights 0 bout 0 pvsout 0 interactions
0 draws 2 vbos 2 ibos 0 verts 0 tris
38 multidraws 561 primitives 12516 tris
  • cg_draw2D 0, r_novis 1, r_nocull 1
1 views 0 portals 49 batches 4880 surfs 661 leafs 0 verts 34452 tri
0 lights 0 bout 0 pvsout 0 interactions
0 draws 2 vbos 2 ibos 0 verts 0 tris
62 multidraws 1883 primitives 34452 tris

Vega default spectator scene, 1472 -1792 176 -180 0, 5fps:

unvanquished_2024-06-03_174338_000

  • cg_draw2D 1, r_novis 0, r_nocull 0
1 views 0 portals 96 batches 1098 surfs 51 leafs 4476 verts 31420 tris
0 lights 0 bout 0 pvsout 0 interactions
53 draws 3 vbos 3 ibos 4476 verts 2238 tris
45 multidraws 453 primitives 29182 tris
  • cg_draw2D 0, r_novis 0, r_nocull 0
1 views 0 portals 43 batches 1098 surfs 51 leafs 0 verts 29182 tris
0 lights 0 bout 0 pvsout 0 interactions
0 draws 2 vbos 2 ibos 0 verts 0 tris
45 multidraws 453 primitives 29182 tris

@illwieckz
Copy link
Member Author

illwieckz commented Jun 3, 2024

So yes the shader permutations seems to be costly, but that's probably not the cause of the performance loss on Vega.

Forget about that… reverting that HACK doesn't slow down the Plat23 map. I now get 60 fps on default Plat23 spectator scene and 33 fps on both Human and Alien base entry scenes… This huge performance boost on default Plat23 scene likely comes from the amount of improvements recently merged or to-be-merged that now lives in the development branch I rebase my current work over.

So, it looks like we are now reaching on Plat23 the performance of the ioq3 opengl2 renderer, so that Vega default spectator slow down is really something wrong.

@illwieckz
Copy link
Member Author

Default Yocto spectator scene (-488 1496 760 -90 0) also suffers from this slow down (I also get 5fps), and the amount of tris is incredibly higher, but we're not incredibly slower than Vega (same 5fps).

  • cg_draw2D 1
1 views 0 portals 138 batches 1214 surfs 13 leafs 4560 verts 198358 tris
0 lights 0 bout 0 pvsout 0 interactions
50 draws 14 vbos 14 ibos 4560 verts 2280 tris
93 multidraws 410 primitives 196078 tris
  • cg_draw2D 0
1 views 0 portals 88 batches 1214 surfs 13 leafs 0 verts 196078 tris 
0 lights 0 bout 0 pvsout 0 interactions 
0 draws 13 vbos 13 ibos 0 verts 0 tris 
93 multidraws 410 primitives 196078 tris 

@illwieckz
Copy link
Member Author

illwieckz commented Jun 3, 2024

With:

diff --git a/src/engine/renderer/tr_backend.cpp b/src/engine/renderer/tr_backend.cpp
index 4c43547d5..5823a8462 100644
--- a/src/engine/renderer/tr_backend.cpp
+++ b/src/engine/renderer/tr_backend.cpp
@@ -4789,6 +4789,7 @@ static void RB_RenderView( bool depthPass )
 	}
 
 	if( depthPass ) {
+		glFinish();
 		RB_RenderDrawSurfaces( shaderSort_t::SS_DEPTH, shaderSort_t::SS_DEPTH, DRAWSURFACES_ALL );
 		RB_RunVisTests();
 		RB_RenderPostDepthLightTile();
diff --git a/src/engine/renderer/tr_shade.cpp b/src/engine/renderer/tr_shade.cpp
index dc3bf44a9..7cec1886a 100644
--- a/src/engine/renderer/tr_shade.cpp
+++ b/src/engine/renderer/tr_shade.cpp
@@ -327,7 +327,10 @@ void Tess_DrawElements()
 	{
 		if ( tess.multiDrawPrimitives )
 		{
+			clock_t start = clock();
 			glMultiDrawElements( GL_TRIANGLES, tess.multiDrawCounts, GL_INDEX_TYPE, ( const GLvoid ** ) tess.multiDrawIndexes, tess.multiDrawPrimitives );
+			clock_t diff = clock() - start;
+			printf("=== time: %ld, %s\n", diff, tess.surfaceShader->name );
 
 			backEnd.pc.c_multiDrawElements++;
 			backEnd.pc.c_multiDrawPrimitives += tess.multiDrawPrimitives;

I get:

=== time: 94926, <default>$depth
=== time: 1533, models/mapobjects/vega/globe/globe$depth
=== time: 63, textures/shared_vega/light01_300
=== time: 594, textures/shared_vega/panel01
=== time: 34, textures/shared_vega/trim07
=== time: 25, textures/shared_vega/trim01
=== time: 36, textures/shared_vega/panel07
=== time: 666, textures/shared_vega/base01
=== time: 6450, textures/shared_vega/base03a
=== time: 9170, textures/shared_vega/base01a
=== time: 10176, textures/shared_vega/trim03a
=== time: 5435, textures/shared_vega/base03
=== time: 1332, textures/shared_vega/trim02
=== time: 1340, textures/shared_vega/panel01a
=== time: 8639, textures/shared_vega/trim04
=== time: 125, textures/shared_vega/base02
=== time: 18, textures/shared_vega/trim02a
=== time: 28, textures/shared_vega/panel06_300
=== time: 57, textures/shared_vega/trim03
=== time: 21, textures/shared_vega/trim03b
=== time: 128, textures/shared_vega/concrete01
=== time: 69, textures/shared_vega/trim01a
=== time: 88, models/mapobjects/vega/globe/globe
=== time: 2175, models/mapobjects/vega/container/container2
=== time: 1853, textures/shared_vega/floor02
=== time: 3166, textures/shared_vega/floor01
=== time: 2364, textures/shared_vega/dirt01
=== time: 3099, textures/shared_vega/light03_1500
=== time: 1671, textures/shared_vega/panel05_300
=== time: 24, textures/shared_vega/panel02
=== time: 11, textures/shared_vega/asphalt01
=== time: 34, models/mapobjects/vega/monitor/display_door_red
=== time: 9, models/mapobjects/vega/monitor/display_door_blue
=== time: 1798, models/mapobjects/vega/monitor/monitor
=== time: 30, models/mapobjects/vega/emitter/emitter
=== time: 1813, textures/vega_custom/starchart02
=== time: 47, textures/vega_custom/roadstrip
=== time: 22, textures/vega_custom/wheeltracks
=== time: 19, textures/vega_custom/vega_banner
=== time: 1552, textures/shared_vega/grate01
=== time: 45, textures/vega_custom/holo_projector
=== time: 21, textures/shared_vega/glass01
=== time: 1590, textures/shared_vega/glass01
=== time: 74, textures/shared_vega/glass01
=== time: 39, textures/shared_vega/glass01

And breaking the execution still breaks in drmIoctl as child of glMultiDrawElements:

Thread 1 (Thread 0x7beb293f5a80 (LWP 207096) "daemon"):
#0  __GI___ioctl (fd=fd@entry=15, request=request@entry=3223348317) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x00007beb2b04b0a0 in drmIoctl (fd=15, request=request@entry=3223348317, arg=arg@entry=0x7ffe45373a00) at ../xf86drm.c:704
#2  0x00007beb2b04e260 in drmCommandWriteRead (fd=<optimized out>, drmCommandIndex=drmCommandIndex@entry=29, data=data@entry=0x7ffe45373a00, size=size@entry=32) at ../xf86drm.c:3186
#3  0x00007beb05c0e013 in radeon_create_bo (rws=rws@entry=0x64ba2c42eb20, size=size@entry=3063808, alignment=alignment@entry=4096, initial_domains=initial_domains@entry=2, flags=flags@entry=16, heap=heap@entry=0) at ../src/gallium/winsys/radeon/drm/radeon_drm_bo.c:632
#4  0x00007beb05c0e7c5 in radeon_winsys_bo_create (rws=0x64ba2c42eb20, size=3063808, alignment=<optimized out>, domain=<optimized out>, flags=<optimized out>) at ../src/gallium/winsys/radeon/drm/radeon_drm_bo.c:1043
#5  0x00007beb05bd25e4 in r300_buffer_create (screen=0x64ba2c42fa90, templ=0x7ffe45373b30) at ../src/gallium/drivers/r300/r300_screen_buffer.c:191
#6  0x00007beb05a9b6a3 in u_upload_alloc_buffer (min_size=3063808, upload=0x64ba2c568be0) at ../src/gallium/auxiliary/util/u_upload_mgr.c:208
#7  u_upload_alloc (upload=0x64ba2c568be0, min_out_offset=<optimized out>, size=640, alignment=alignment@entry=4, out_offset=out_offset@entry=0x7ffe45373c74, outbuf=outbuf@entry=0x7ffe45373c78, ptr=0x7ffe45373c80) at ../src/gallium/auxiliary/util/u_upload_mgr.c:273
#8  0x00007beb05a9cca3 in u_vbuf_translate_buffers (unroll_indices=false, min_index=<optimized out>, num_vertices=40, start_vertex=191448, out_vb=1, vb_mask=<optimized out>, draw=<optimized out>, info=<optimized out>, key=<optimized out>, mgr=0x64ba2c55d340) at ../src/gallium/auxiliary/util/u_vbuf.c:587
#9  u_vbuf_translate_begin (mgr=mgr@entry=0x64ba2c55d340, info=info@entry=0x7ffe45374b30, draw=draw@entry=0x7ffe45374b14, start_vertex=start_vertex@entry=191448, num_vertices=num_vertices@entry=40, min_index=<optimized out>, unroll_indices=<optimized out>, misaligned=<optimized out>) at ../src/gallium/auxiliary/util/u_vbuf.c:805
#10 0x00007beb05a9f120 in u_vbuf_draw_vbo (pipe=0x64ba2c567270, info=0x7ffe45374cc0, drawid_offset=4, indirect=0x0, draws=<optimized out>, num_draws=7) at ../src/gallium/auxiliary/util/u_vbuf.c:1726
#11 0x00007beb056c3cb3 in _mesa_validated_multidrawelements (ctx=ctx@entry=0x64ba2c4e03e0, index_bo=<optimized out>, mode=mode@entry=4, count=count@entry=0x64ba2b745490 <tess+8432>, type=type@entry=5125, indices=indices@entry=0x64ba2b743550 <tess+432>, primcount=<optimized out>, basevertex=0x0) at ../src/mesa/main/draw.c:2118
#12 0x00007beb056c5bc9 in _mesa_MultiDrawElements (mode=4, count=0x64ba2b745490 <tess+8432>, type=5125, indices=0x64ba2b743550 <tess+432>, primcount=7) at ../src/mesa/main/draw.c:2171
#13 0x000064ba2ab93d90 in Tess_DrawElements () at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:330
#14 0x000064ba2ab96d7c in Render_lightMapping (pStage=0x7beb0b31e180) at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:1245
#15 0x000064ba2ab9d6cb in Tess_StageIteratorColor () at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:2624
#16 0x000064ba2ab9df23 in Tess_End () at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:2860
#17 0x000064ba2ab2e0a9 in RB_RenderDrawSurfaces (fromSort=fromSort@entry=shaderSort_t::SS_ENVIRONMENT_FOG, toSort=toSort@entry=shaderSort_t::SS_OPAQUE, drawSurfFilter=drawSurfFilter@entry=DRAWSURFACES_ALL) at Unvanquished/daemon/src/engine/renderer/tr_backend.cpp:903
#18 0x000064ba2ab3e2af in RB_RenderView (depthPass=false) at Unvanquished/daemon/src/engine/renderer/tr_backend.cpp:4814
#19 DrawViewCommand::ExecuteSelf (this=0x7beb0ae90364) at Unvanquished/daemon/src/engine/renderer/tr_backend.cpp:5701
#20 0x000064ba2ab30f09 in RB_ExecuteRenderCommands (data=<optimized out>) at Unvanquished/daemon/src/engine/renderer/tr_backend.cpp:5951
#21 0x000064ba2ab50a12 in RE_EndFrame (frontEndMsec=0x0, backEndMsec=0x0) at Unvanquished/daemon/src/engine/renderer/tr_cmds.cpp:889
#22 0x000064ba2aad0f70 in SCR_UpdateScreen () at Unvanquished/daemon/src/engine/client/cl_scrn.cpp:327
#23 0x000064ba2aac58bd in CL_Frame (msec=msec@entry=146) at Unvanquished/daemon/src/engine/client/cl_main.cpp:2087
#24 0x000064ba2aa57f21 in Com_Frame () at Unvanquished/daemon/src/engine/qcommon/common.cpp:958
#25 0x000064ba2aa5151d in Application::ClientApplication::Frame (this=0x64ba2ad26e00 <Application::GetApp()::app>) at Unvanquished/daemon/src/engine/client/ClientApplication.cpp:96
#26 0x000064ba2aa50e45 in main (argc=<optimized out>, argv=<optimized out>) at Unvanquished/daemon/src/engine/framework/System.cpp:784

@illwieckz
Copy link
Member Author

For comparison, this is the timing log of the Plat23 whole map birdview:

=== time: 5766, <default>$depth
=== time: 370, textures/shared_pk02/panels01c
=== time: 215, textures/shared_pk02/trim03
=== time: 39, textures/shared_pk02/wall05c
=== time: 437, textures/shared_pk02/trim04a
=== time: 76, textures/shared_pk02/wall08a
=== time: 1114, textures/shared_pk02/generic01b
=== time: 49, textures/shared_pk02/wall_big02a
=== time: 479, textures/shared_pk02/trim04b
=== time: 42, textures/plat23_pk02/pipes01_250
=== time: 131, textures/shared_pk02/trim02
=== time: 89, textures/plat23_pk02/light03_blue_750
=== time: 99, textures/plat23_pk02/light03_blue_750
=== time: 63, textures/plat23_pk02/light03_red_750
=== time: 55, textures/plat23_pk02/light03_red_750
=== time: 173, textures/shared_pk02/ceiling03
=== time: 42, textures/shared_pk02/panels01b
=== time: 214, textures/shared_pk02/wall_big02b
=== time: 35, textures/shared_pk02/wall06a
=== time: 34, textures/shared_pk02/wall03b
=== time: 197, textures/shared_pk02/trim01
=== time: 22, textures/shared_pk02/switches01a
=== time: 21, textures/shared_pk02/sand01
=== time: 15, textures/shared_pk02/floor_cyl01
=== time: 22, textures/shared_pk02/floor12b
=== time: 41, textures/shared_pk02/floor12a
=== time: 15, textures/shared_pk02/floor09a
=== time: 83, textures/shared_pk02/floor06a
=== time: 47, textures/shared_pk02/floor05
=== time: 26, textures/shared_pk02/floor04
=== time: 62, textures/shared_pk02/floor03
=== time: 30, textures/shared_pk02/floor01
=== time: 34, textures/plat23_pk02/switches01b_250
=== time: 36, textures/plat23_pk02/switches01b_250
=== time: 36, textures/plat23_pk02/rock01_terrain_yz
=== time: 71, textures/plat23_pk02/rock01_terrain_xz
=== time: 6, textures/plat23_pk02/rock01_terrain_xy
=== time: 26, textures/plat23_pk02/light02_yellow_3000
=== time: 31, textures/plat23_pk02/light02_yellow_3000
=== time: 24, textures/plat23_pk02/light02_yellow_2000
=== time: 35, textures/plat23_pk02/light02_yellow_2000
=== time: 22, textures/plat23_pk02/light02_orange_3000
=== time: 32, textures/plat23_pk02/light02_orange_3000
=== time: 58, textures/plat23_pk02/light02_orange_2000
=== time: 27, textures/plat23_pk02/light02_orange_2000
=== time: 28, textures/plat23_pk02/light01_yellow_5000
=== time: 23, textures/plat23_pk02/light01_yellow_5000
=== time: 38, textures/plat23_pk02/light01_yellow_2000
=== time: 36, textures/plat23_pk02/light01_yellow_2000
=== time: 27, textures/plat23_pk02/light01_red_2000
=== time: 24, textures/plat23_pk02/light01_red_2000
=== time: 60, textures/plat23_pk02/light01_blue_2000
=== time: 37, textures/plat23_pk02/light01_blue_2000
=== time: 35, textures/plat23_pk02/computer01a_250
=== time: 20, textures/plat23_pk02/computer01a_250
=== time: 23, textures/plat23_pk02/dec_num03
=== time: 33, textures/plat23_pk02/dec_num02
=== time: 18, textures/plat23_custom/lol
=== time: 36, textures/plat23_custom/forcefield
=== time: 18, textures/plat23_custom/forcefield
=== time: 66, textures/plat23_custom/forcefield

@illwieckz
Copy link
Member Author

I did that:

--- a/src/engine/renderer/tr_shade.cpp
+++ b/src/engine/renderer/tr_shade.cpp
@@ -327,7 +327,15 @@ void Tess_DrawElements()
 	{
 		if ( tess.multiDrawPrimitives )
 		{
-			glMultiDrawElements( GL_TRIANGLES, tess.multiDrawCounts, GL_INDEX_TYPE, ( const GLvoid ** ) tess.multiDrawIndexes, tess.multiDrawPrimitives );
+			// glMultiDrawElements( GL_TRIANGLES, tess.multiDrawCounts, GL_INDEX_TYPE, ( const GLvoid ** ) tess.multiDrawIndexes, tess.multiDrawPrimitives );
+
+			for ( i = 0; i < tess.multiDrawPrimitives; i++ )
+			{
+				clock_t start = clock();
+				glDrawElements( GL_TRIANGLES, tess.multiDrawCounts[ i ], GL_INDEX_TYPE, tess.multiDrawIndexes[ i ] );
+				clock_t diff = clock() - start;
+				printf("=== time: %ld, %s, %d\n", diff, tess.surfaceShader->name, i );
+			}
 
 			backEnd.pc.c_multiDrawElements++;
 			backEnd.pc.c_multiDrawPrimitives += tess.multiDrawPrimitives;

And now the log is:

=== time: 1382, <default>$depth, 0
=== time: 42, <default>$depth, 1
=== time: 32, <default>$depth, 2
=== time: 15, <default>$depth, 3
=== time: 19, <default>$depth, 4
=== time: 16, <default>$depth, 5
=== time: 13, <default>$depth, 6
=== time: 25, <default>$depth, 7
=== time: 11, <default>$depth, 8
=== time: 21, <default>$depth, 9
=== time: 14, <default>$depth, 10
=== time: 47, <default>$depth, 11
=== time: 15, <default>$depth, 12
=== time: 6, <default>$depth, 13
=== time: 9, <default>$depth, 14
=== time: 5, <default>$depth, 15
=== time: 16, <default>$depth, 16
=== time: 22, <default>$depth, 17
=== time: 22, <default>$depth, 18
=== time: 40, <default>$depth, 19
=== time: 18, <default>$depth, 20
=== time: 15, <default>$depth, 21
=== time: 8, <default>$depth, 22
=== time: 13, <default>$depth, 23
=== time: 6, <default>$depth, 24
=== time: 7, <default>$depth, 25
=== time: 22, <default>$depth, 26
=== time: 5, <default>$depth, 27
=== time: 12, <default>$depth, 28
=== time: 8, <default>$depth, 29
=== time: 7, <default>$depth, 30
=== time: 7, <default>$depth, 31
=== time: 21, <default>$depth, 32
=== time: 61, <default>$depth, 33
=== time: 26, <default>$depth, 34
=== time: 43, <default>$depth, 35
=== time: 11, <default>$depth, 36
=== time: 14, <default>$depth, 37
=== time: 8, <default>$depth, 38
=== time: 5, <default>$depth, 39
=== time: 11, <default>$depth, 40
=== time: 22, <default>$depth, 41
=== time: 5, <default>$depth, 42
=== time: 11, <default>$depth, 43
=== time: 13, <default>$depth, 44
=== time: 12, <default>$depth, 45
=== time: 5, <default>$depth, 46
=== time: 7, <default>$depth, 47
=== time: 12, <default>$depth, 48
=== time: 12, <default>$depth, 49
=== time: 5, <default>$depth, 50
=== time: 6, <default>$depth, 51
=== time: 38, <default>$depth, 52
=== time: 12, <default>$depth, 53
=== time: 14, <default>$depth, 54
=== time: 7, <default>$depth, 55
=== time: 5, <default>$depth, 56
=== time: 6, <default>$depth, 57
=== time: 5, <default>$depth, 58
=== time: 5, <default>$depth, 59
=== time: 8, <default>$depth, 60
=== time: 9, <default>$depth, 61
=== time: 14, <default>$depth, 62
=== time: 16, <default>$depth, 63
=== time: 12, <default>$depth, 64
=== time: 11, <default>$depth, 65
=== time: 5, <default>$depth, 66
=== time: 11, <default>$depth, 67
=== time: 7, <default>$depth, 68
=== time: 12, <default>$depth, 69
=== time: 12, <default>$depth, 70
=== time: 16, <default>$depth, 71
=== time: 6, <default>$depth, 72
=== time: 71, <default>$depth, 73
=== time: 12, <default>$depth, 74
=== time: 11, <default>$depth, 75
=== time: 38, <default>$depth, 76
=== time: 14, <default>$depth, 77
=== time: 27, <default>$depth, 78
=== time: 14, <default>$depth, 79
=== time: 109, <default>$depth, 80
=== time: 17, <default>$depth, 81
=== time: 5, <default>$depth, 82
=== time: 5, <default>$depth, 83
=== time: 1775, <default>$depth, 84
=== time: 1829, <default>$depth, 85
=== time: 13, <default>$depth, 86
=== time: 9, <default>$depth, 87
=== time: 5, <default>$depth, 88
=== time: 7, <default>$depth, 89
=== time: 14, <default>$depth, 90
=== time: 19, <default>$depth, 91
=== time: 5, <default>$depth, 92
=== time: 1744, <default>$depth, 93
=== time: 20, <default>$depth, 94
=== time: 5, <default>$depth, 95
=== time: 6, <default>$depth, 96
=== time: 6, <default>$depth, 97
=== time: 591, <default>$depth, 98
=== time: 703, <default>$depth, 99
=== time: 1791, <default>$depth, 100
=== time: 12, <default>$depth, 101
=== time: 57, <default>$depth, 102
=== time: 1405, <default>$depth, 103
=== time: 1543, <default>$depth, 104
=== time: 56, <default>$depth, 105
=== time: 1358, <default>$depth, 106
=== time: 11, <default>$depth, 107
=== time: 1966, <default>$depth, 108
=== time: 22, <default>$depth, 109
=== time: 575, <default>$depth, 110
=== time: 14, <default>$depth, 111
=== time: 683, <default>$depth, 112
=== time: 20, <default>$depth, 113
=== time: 6, <default>$depth, 114
=== time: 5, <default>$depth, 115
=== time: 5, <default>$depth, 116
=== time: 5, <default>$depth, 117
=== time: 29, <default>$depth, 118
=== time: 591, <default>$depth, 119
=== time: 2651, <default>$depth, 120
=== time: 1835, <default>$depth, 121
=== time: 11, <default>$depth, 122
=== time: 1168, <default>$depth, 123
=== time: 1022, <default>$depth, 124
=== time: 1513, <default>$depth, 125
=== time: 13, <default>$depth, 126
=== time: 1840, <default>$depth, 127
=== time: 17, <default>$depth, 128
=== time: 822, <default>$depth, 129
=== time: 1260, <default>$depth, 130
=== time: 745, <default>$depth, 131
=== time: 1107, <default>$depth, 132
=== time: 12, <default>$depth, 133
=== time: 1515, <default>$depth, 134
=== time: 14, <default>$depth, 135
=== time: 618, <default>$depth, 136
=== time: 691, <default>$depth, 137
=== time: 1466, <default>$depth, 138
=== time: 13, <default>$depth, 139
=== time: 5, <default>$depth, 140
=== time: 5, <default>$depth, 141
=== time: 5, <default>$depth, 142
=== time: 7, <default>$depth, 143
=== time: 7, <default>$depth, 144
=== time: 5, <default>$depth, 145
=== time: 1466, <default>$depth, 146
=== time: 26, <default>$depth, 147
=== time: 5, <default>$depth, 148
=== time: 5, <default>$depth, 149
=== time: 8, <default>$depth, 150
=== time: 1050, <default>$depth, 151
=== time: 2374, <default>$depth, 152
=== time: 743, <default>$depth, 153
=== time: 651, <default>$depth, 154
=== time: 20, <default>$depth, 155
=== time: 1062, <default>$depth, 156
=== time: 19, <default>$depth, 157
=== time: 1318, <default>$depth, 158
=== time: 1355, <default>$depth, 159
=== time: 1462, <default>$depth, 160
=== time: 553, <default>$depth, 161
=== time: 1147, <default>$depth, 162
=== time: 42, <default>$depth, 163
=== time: 1525, <default>$depth, 164
=== time: 11, <default>$depth, 165
=== time: 6, <default>$depth, 166
=== time: 5, <default>$depth, 167
=== time: 5, <default>$depth, 168
=== time: 5, <default>$depth, 169
=== time: 9, <default>$depth, 170
=== time: 5, <default>$depth, 171
=== time: 5, <default>$depth, 172
=== time: 5, <default>$depth, 173
=== time: 5, <default>$depth, 174
=== time: 5, <default>$depth, 175
=== time: 5, <default>$depth, 176
=== time: 5, <default>$depth, 177
=== time: 5, <default>$depth, 178
=== time: 7, <default>$depth, 179
=== time: 5, <default>$depth, 180
=== time: 6, <default>$depth, 181
=== time: 987, <default>$depth, 182
=== time: 655, <default>$depth, 183
=== time: 748, <default>$depth, 184
=== time: 1132, <default>$depth, 185
=== time: 562, <default>$depth, 186
=== time: 1522, <default>$depth, 187
=== time: 1280, <default>$depth, 188
=== time: 1262, <default>$depth, 189
=== time: 14, <default>$depth, 190
=== time: 579, <default>$depth, 191
=== time: 2300, <default>$depth, 192
=== time: 1078, <default>$depth, 193
=== time: 676, <default>$depth, 194
=== time: 12, <default>$depth, 195
=== time: 6, <default>$depth, 196
=== time: 5, <default>$depth, 197
=== time: 5, <default>$depth, 198
=== time: 1149, <default>$depth, 199
=== time: 1386, <default>$depth, 200
=== time: 13, <default>$depth, 201
=== time: 1745, <default>$depth, 202
=== time: 1043, <default>$depth, 203
=== time: 823, <default>$depth, 204
=== time: 1303, <default>$depth, 205
=== time: 1536, <default>$depth, 206
=== time: 14, <default>$depth, 207
=== time: 1813, <default>$depth, 208
=== time: 16, <default>$depth, 209
=== time: 577, <default>$depth, 210
=== time: 1393, <default>$depth, 211
=== time: 658, <default>$depth, 212
=== time: 12, <default>$depth, 213
=== time: 1341, <default>$depth, 214
=== time: 13, <default>$depth, 215
=== time: 6, <default>$depth, 216
=== time: 1329, <default>$depth, 217
=== time: 1409, <default>$depth, 218
=== time: 11, <default>$depth, 219
=== time: 8, <default>$depth, 220
=== time: 671, <default>$depth, 221
=== time: 1335, <default>$depth, 222
=== time: 1431, <default>$depth, 223
=== time: 29, <default>$depth, 224
=== time: 662, <default>$depth, 225
=== time: 1454, <default>$depth, 226
=== time: 11, <default>$depth, 227
=== time: 559, <default>$depth, 228
=== time: 1178, <default>$depth, 229
=== time: 1278, <default>$depth, 230
=== time: 1856, <default>$depth, 231
=== time: 13, <default>$depth, 232
=== time: 9, <default>$depth, 233
=== time: 5, <default>$depth, 234
=== time: 5, <default>$depth, 235
=== time: 558, <default>$depth, 236
=== time: 1325, <default>$depth, 237
=== time: 1465, <default>$depth, 238
=== time: 11, <default>$depth, 239
=== time: 6, <default>$depth, 240
=== time: 5, <default>$depth, 241
=== time: 20, <default>$depth, 242
=== time: 1503, models/mapobjects/vega/globe/globe$depth, 0
=== time: 61, textures/shared_vega/light01_300, 0
=== time: 6, textures/shared_vega/light01_300, 1
=== time: 16, textures/shared_vega/panel01, 0
=== time: 557, textures/shared_vega/panel01, 1
=== time: 12, textures/shared_vega/panel01, 2
=== time: 6, textures/shared_vega/panel01, 3
=== time: 5, textures/shared_vega/panel01, 4
=== time: 4, textures/shared_vega/panel01, 5
=== time: 6, textures/shared_vega/panel01, 6
=== time: 5, textures/shared_vega/panel01, 7
=== time: 25, textures/shared_vega/trim07, 0
=== time: 6, textures/shared_vega/trim07, 1
=== time: 7, textures/shared_vega/trim07, 2
=== time: 13, textures/shared_vega/trim01, 0
=== time: 5, textures/shared_vega/trim01, 1
=== time: 5, textures/shared_vega/trim01, 2
=== time: 4, textures/shared_vega/trim01, 3
=== time: 6, textures/shared_vega/trim01, 4
=== time: 5, textures/shared_vega/trim01, 5
=== time: 30, textures/shared_vega/panel07, 0
=== time: 11, textures/shared_vega/panel07, 1
=== time: 45, textures/shared_vega/base01, 0
=== time: 42, textures/shared_vega/base01, 1
=== time: 14, textures/shared_vega/base01, 2
=== time: 41, textures/shared_vega/base01, 3
=== time: 10, textures/shared_vega/base01, 4
=== time: 23, textures/shared_vega/base01, 5
=== time: 22, textures/shared_vega/base01, 6
=== time: 39, textures/shared_vega/base01, 7
=== time: 8, textures/shared_vega/base01, 8
=== time: 15, textures/shared_vega/base01, 9
=== time: 62, textures/shared_vega/base01, 10
=== time: 26, textures/shared_vega/base01, 11
=== time: 6, textures/shared_vega/base01, 12
=== time: 8, textures/shared_vega/base01, 13
=== time: 10, textures/shared_vega/base01, 14
=== time: 31, textures/shared_vega/base01, 15
=== time: 14, textures/shared_vega/base01, 16
=== time: 27, textures/shared_vega/base01, 17
=== time: 13, textures/shared_vega/base01, 18
=== time: 6, textures/shared_vega/base01, 19
=== time: 14, textures/shared_vega/base01, 20
=== time: 22, textures/shared_vega/base01, 21
=== time: 20, textures/shared_vega/base01, 22
=== time: 10, textures/shared_vega/base01, 23
=== time: 7, textures/shared_vega/base01, 24
=== time: 34, textures/shared_vega/base01, 25
=== time: 4, textures/shared_vega/base01, 26
=== time: 32, textures/shared_vega/base01, 27
=== time: 13, textures/shared_vega/base01, 28
=== time: 9, textures/shared_vega/base01, 29
=== time: 5, textures/shared_vega/base01, 30
=== time: 6, textures/shared_vega/base01, 31
=== time: 20, textures/shared_vega/base01, 32
=== time: 22, textures/shared_vega/base01, 33
=== time: 4, textures/shared_vega/base01, 34
=== time: 9, textures/shared_vega/base01, 35
=== time: 29, textures/shared_vega/base01, 36
=== time: 34, textures/shared_vega/base01, 37
=== time: 5, textures/shared_vega/base01, 38
=== time: 8, textures/shared_vega/base01, 39
=== time: 811, textures/shared_vega/base03a, 0
=== time: 840, textures/shared_vega/base03a, 1
=== time: 737, textures/shared_vega/base03a, 2
=== time: 1166, textures/shared_vega/base03a, 3
=== time: 14, textures/shared_vega/base03a, 4
=== time: 12, textures/shared_vega/base03a, 5
=== time: 679, textures/shared_vega/base03a, 6
=== time: 733, textures/shared_vega/base03a, 7
=== time: 667, textures/shared_vega/base03a, 8
=== time: 19, textures/shared_vega/base03a, 9
=== time: 1216, textures/shared_vega/base01a, 0
=== time: 1077, textures/shared_vega/base01a, 1
=== time: 1138, textures/shared_vega/base01a, 2
=== time: 21, textures/shared_vega/base01a, 3
=== time: 984, textures/shared_vega/base01a, 4
=== time: 1101, textures/shared_vega/base01a, 5
=== time: 1098, textures/shared_vega/base01a, 6
=== time: 1010, textures/shared_vega/base01a, 7
=== time: 1055, textures/shared_vega/base01a, 8
=== time: 992, textures/shared_vega/base01a, 9
=== time: 1322, textures/shared_vega/trim03a, 0
=== time: 16, textures/shared_vega/trim03a, 1
=== time: 1183, textures/shared_vega/trim03a, 2
=== time: 11, textures/shared_vega/trim03a, 3
=== time: 1198, textures/shared_vega/trim03a, 4
=== time: 1266, textures/shared_vega/trim03a, 5
=== time: 15, textures/shared_vega/trim03a, 6
=== time: 14, textures/shared_vega/trim03a, 7
=== time: 7, textures/shared_vega/trim03a, 8
=== time: 1132, textures/shared_vega/trim03a, 9
=== time: 501, textures/shared_vega/trim03a, 10
=== time: 1261, textures/shared_vega/trim03a, 11
=== time: 1793, textures/shared_vega/trim03a, 12
=== time: 13, textures/shared_vega/trim03a, 13
=== time: 6, textures/shared_vega/trim03a, 14
=== time: 6, textures/shared_vega/trim03a, 15
=== time: 9, textures/shared_vega/trim03a, 16
=== time: 11, textures/shared_vega/trim03a, 17
=== time: 1139, textures/shared_vega/trim03a, 18
=== time: 21, textures/shared_vega/trim03a, 19
=== time: 1132, textures/shared_vega/trim03a, 20
=== time: 1417, textures/shared_vega/base03, 0
=== time: 1322, textures/shared_vega/base03, 1
=== time: 1270, textures/shared_vega/base03, 2
=== time: 1410, textures/shared_vega/base03, 3
=== time: 1319, textures/shared_vega/trim02, 0
=== time: 1328, textures/shared_vega/panel01a, 0
=== time: 1474, textures/shared_vega/trim04, 0
=== time: 16, textures/shared_vega/trim04, 1
=== time: 1378, textures/shared_vega/trim04, 2
=== time: 18, textures/shared_vega/trim04, 3
=== time: 4, textures/shared_vega/trim04, 4
=== time: 8, textures/shared_vega/trim04, 5
=== time: 5, textures/shared_vega/trim04, 6
=== time: 2873, textures/shared_vega/trim04, 7
=== time: 4393, textures/shared_vega/trim04, 8
=== time: 18, textures/shared_vega/trim04, 9
=== time: 5, textures/shared_vega/trim04, 10
=== time: 7, textures/shared_vega/trim04, 11
=== time: 48, textures/shared_vega/base02, 0
=== time: 18, textures/shared_vega/base02, 1
=== time: 5, textures/shared_vega/base02, 2
=== time: 13, textures/shared_vega/base02, 3
=== time: 14, textures/shared_vega/base02, 4
=== time: 12, textures/shared_vega/base02, 5
=== time: 5, textures/shared_vega/base02, 6
=== time: 17, textures/shared_vega/trim02a, 0
=== time: 20, textures/shared_vega/panel06_300, 0
=== time: 7, textures/shared_vega/panel06_300, 1
=== time: 6, textures/shared_vega/panel06_300, 2
=== time: 31, textures/shared_vega/trim03, 0
=== time: 12, textures/shared_vega/trim03, 1
=== time: 9, textures/shared_vega/trim03, 2
=== time: 14, textures/shared_vega/trim03, 3
=== time: 21, textures/shared_vega/trim03b, 0
=== time: 36, textures/shared_vega/concrete01, 0
=== time: 7, textures/shared_vega/concrete01, 1
=== time: 6, textures/shared_vega/concrete01, 2
=== time: 7, textures/shared_vega/concrete01, 3
=== time: 14, textures/shared_vega/concrete01, 4
=== time: 6, textures/shared_vega/concrete01, 5
=== time: 12, textures/shared_vega/concrete01, 6
=== time: 11, textures/shared_vega/concrete01, 7
=== time: 8, textures/shared_vega/concrete01, 8
=== time: 4, textures/shared_vega/concrete01, 9
=== time: 17, textures/shared_vega/concrete01, 10
=== time: 12, textures/shared_vega/concrete01, 11
=== time: 5, textures/shared_vega/concrete01, 12
=== time: 10, textures/shared_vega/concrete01, 13
=== time: 20, textures/shared_vega/trim01a, 0
=== time: 42, textures/shared_vega/trim01a, 1
=== time: 8, textures/shared_vega/trim01a, 2
=== time: 92, models/mapobjects/vega/globe/globe, 0
=== time: 2182, models/mapobjects/vega/container/container2, 0
=== time: 29, models/mapobjects/vega/container/container2, 1
=== time: 20, textures/shared_vega/floor02, 0
=== time: 1650, textures/shared_vega/floor02, 1
=== time: 1100, textures/shared_vega/floor02, 2
=== time: 6, textures/shared_vega/floor02, 3
=== time: 2551, textures/shared_vega/floor01, 0
=== time: 20, textures/shared_vega/floor01, 1
=== time: 6, textures/shared_vega/floor01, 2
=== time: 5, textures/shared_vega/floor01, 3
=== time: 5, textures/shared_vega/floor01, 4
=== time: 5, textures/shared_vega/floor01, 5
=== time: 5, textures/shared_vega/floor01, 6
=== time: 1555, textures/shared_vega/dirt01, 0
=== time: 11, textures/shared_vega/dirt01, 1
=== time: 629, textures/shared_vega/light03_1500, 0
=== time: 1641, textures/shared_vega/light03_1500, 1
=== time: 1780, textures/shared_vega/panel05_300, 0
=== time: 12, textures/shared_vega/panel05_300, 1
=== time: 18, textures/shared_vega/panel02, 0
=== time: 11, textures/shared_vega/asphalt01, 0
=== time: 35, models/mapobjects/vega/monitor/display_door_red, 0
=== time: 8, models/mapobjects/vega/monitor/display_door_blue, 0
=== time: 1892, models/mapobjects/vega/monitor/monitor, 0
=== time: 32, models/mapobjects/vega/emitter/emitter, 0
=== time: 1815, textures/vega_custom/starchart02, 0
=== time: 45, textures/vega_custom/roadstrip, 0
=== time: 23, textures/vega_custom/wheeltracks, 0
=== time: 18, textures/vega_custom/vega_banner, 0
=== time: 17, textures/shared_vega/grate01, 0
=== time: 1586, textures/shared_vega/grate01, 1
=== time: 19, textures/shared_vega/grate01, 2
=== time: 54, textures/vega_custom/holo_projector, 0
=== time: 19, textures/shared_vega/glass01, 0
=== time: 3, textures/shared_vega/glass01, 1
=== time: 2, textures/shared_vega/glass01, 2
=== time: 2, textures/shared_vega/glass01, 3
=== time: 2, textures/shared_vega/glass01, 4
=== time: 2, textures/shared_vega/glass01, 5
=== time: 2, textures/shared_vega/glass01, 6
=== time: 16, textures/shared_vega/glass01, 0
=== time: 5, textures/shared_vega/glass01, 1
=== time: 7, textures/shared_vega/glass01, 2
=== time: 6, textures/shared_vega/glass01, 3
=== time: 5, textures/shared_vega/glass01, 4
=== time: 5, textures/shared_vega/glass01, 5
=== time: 2917, textures/shared_vega/glass01, 6
=== time: 59, textures/shared_vega/glass01, 0
=== time: 6, textures/shared_vega/glass01, 1
=== time: 7, textures/shared_vega/glass01, 2
=== time: 5, textures/shared_vega/glass01, 3
=== time: 5, textures/shared_vega/glass01, 4
=== time: 4, textures/shared_vega/glass01, 5
=== time: 8, textures/shared_vega/glass01, 6
=== time: 40, textures/shared_vega/glass01, 0

@illwieckz
Copy link
Member Author

With this:

--- a/src/engine/renderer/tr_shade.cpp
+++ b/src/engine/renderer/tr_shade.cpp
@@ -327,7 +327,15 @@ void Tess_DrawElements()
 	{
 		if ( tess.multiDrawPrimitives )
 		{
-			glMultiDrawElements( GL_TRIANGLES, tess.multiDrawCounts, GL_INDEX_TYPE, ( const GLvoid ** ) tess.multiDrawIndexes, tess.multiDrawPrimitives );
+			// glMultiDrawElements( GL_TRIANGLES, tess.multiDrawCounts, GL_INDEX_TYPE, ( const GLvoid ** ) tess.multiDrawIndexes, tess.multiDrawPrimitives );
+
+			for ( i = 0; i < tess.multiDrawPrimitives; i++ )
+			{
+				clock_t start = clock();
+				glDrawElements( GL_TRIANGLES, tess.multiDrawCounts[ i ], GL_INDEX_TYPE, tess.multiDrawIndexes[ i ] );
+				clock_t diff = clock() - start;
+				printf("=== time: %ld, %s#%d, drawCounts: %d\n", diff, tess.surfaceShader->name, i, tess.multiDrawCounts[ i ] );
+			}
 
 			backEnd.pc.c_multiDrawElements++;
 			backEnd.pc.c_multiDrawPrimitives += tess.multiDrawPrimitives;

I get that:

=== time: 123, <default>$depth#0: drawCounts 633
=== time: 48, <default>$depth#1: drawCounts 486
=== time: 38, <default>$depth#2: drawCounts 552
=== time: 18, <default>$depth#3: drawCounts 255
=== time: 27, <default>$depth#4: drawCounts 324
=== time: 48, <default>$depth#5: drawCounts 84
=== time: 22, <default>$depth#6: drawCounts 36
=== time: 29, <default>$depth#7: drawCounts 240
=== time: 12, <default>$depth#8: drawCounts 120
=== time: 22, <default>$depth#9: drawCounts 390
=== time: 15, <default>$depth#10: drawCounts 54
=== time: 46, <default>$depth#11: drawCounts 1281
=== time: 15, <default>$depth#12: drawCounts 48
=== time: 5, <default>$depth#13: drawCounts 6
=== time: 9, <default>$depth#14: drawCounts 108
=== time: 5, <default>$depth#15: drawCounts 6
=== time: 32, <default>$depth#16: drawCounts 96
=== time: 23, <default>$depth#17: drawCounts 363
=== time: 28, <default>$depth#18: drawCounts 399
=== time: 39, <default>$depth#19: drawCounts 1110
=== time: 42, <default>$depth#20: drawCounts 318
=== time: 16, <default>$depth#21: drawCounts 75
=== time: 8, <default>$depth#22: drawCounts 84
=== time: 13, <default>$depth#23: drawCounts 24
=== time: 7, <default>$depth#24: drawCounts 24
=== time: 7, <default>$depth#25: drawCounts 48
=== time: 21, <default>$depth#26: drawCounts 303
=== time: 6, <default>$depth#27: drawCounts 12
=== time: 35, <default>$depth#28: drawCounts 6
=== time: 8, <default>$depth#29: drawCounts 72
=== time: 7, <default>$depth#30: drawCounts 90
=== time: 8, <default>$depth#31: drawCounts 78
=== time: 21, <default>$depth#32: drawCounts 255
=== time: 62, <default>$depth#33: drawCounts 1401
=== time: 43, <default>$depth#34: drawCounts 465
=== time: 44, <default>$depth#35: drawCounts 732
=== time: 12, <default>$depth#36: drawCounts 12
=== time: 13, <default>$depth#37: drawCounts 48
=== time: 9, <default>$depth#38: drawCounts 42
=== time: 5, <default>$depth#39: drawCounts 6
=== time: 13, <default>$depth#40: drawCounts 12
=== time: 23, <default>$depth#41: drawCounts 246
=== time: 5, <default>$depth#42: drawCounts 12
=== time: 10, <default>$depth#43: drawCounts 6
=== time: 15, <default>$depth#44: drawCounts 60
=== time: 12, <default>$depth#45: drawCounts 42
=== time: 5, <default>$depth#46: drawCounts 18
=== time: 7, <default>$depth#47: drawCounts 54
=== time: 35, <default>$depth#48: drawCounts 24
=== time: 12, <default>$depth#49: drawCounts 33
=== time: 6, <default>$depth#50: drawCounts 18
=== time: 5, <default>$depth#51: drawCounts 18
=== time: 38, <default>$depth#52: drawCounts 555
=== time: 34, <default>$depth#53: drawCounts 48
=== time: 14, <default>$depth#54: drawCounts 48
=== time: 6, <default>$depth#55: drawCounts 24
=== time: 6, <default>$depth#56: drawCounts 18
=== time: 16, <default>$depth#57: drawCounts 24
=== time: 6, <default>$depth#58: drawCounts 18
=== time: 5, <default>$depth#59: drawCounts 6
=== time: 9, <default>$depth#60: drawCounts 117
=== time: 9, <default>$depth#61: drawCounts 102
=== time: 24, <default>$depth#62: drawCounts 72
=== time: 15, <default>$depth#63: drawCounts 126
=== time: 11, <default>$depth#64: drawCounts 12
=== time: 11, <default>$depth#65: drawCounts 6
=== time: 7, <default>$depth#66: drawCounts 12
=== time: 12, <default>$depth#67: drawCounts 24
=== time: 7, <default>$depth#68: drawCounts 72
=== time: 11, <default>$depth#69: drawCounts 12
=== time: 12, <default>$depth#70: drawCounts 48
=== time: 35, <default>$depth#71: drawCounts 216
=== time: 7, <default>$depth#72: drawCounts 36
=== time: 92, <default>$depth#73: drawCounts 1713
=== time: 12, <default>$depth#74: drawCounts 12
=== time: 11, <default>$depth#75: drawCounts 6
=== time: 39, <default>$depth#76: drawCounts 531
=== time: 14, <default>$depth#77: drawCounts 237
=== time: 27, <default>$depth#78: drawCounts 726
=== time: 14, <default>$depth#79: drawCounts 309
=== time: 122, <default>$depth#80: drawCounts 2040
=== time: 19, <default>$depth#81: drawCounts 138
=== time: 6, <default>$depth#82: drawCounts 9
=== time: 7, <default>$depth#83: drawCounts 6
=== time: 1953, <default>$depth#84: drawCounts 18
=== time: 1806, <default>$depth#85: drawCounts 18
=== time: 12, <default>$depth#86: drawCounts 12
=== time: 8, <default>$depth#87: drawCounts 72
=== time: 6, <default>$depth#88: drawCounts 12
=== time: 7, <default>$depth#89: drawCounts 66
=== time: 15, <default>$depth#90: drawCounts 234
=== time: 19, <default>$depth#91: drawCounts 96
=== time: 5, <default>$depth#92: drawCounts 6
=== time: 1930, <default>$depth#93: drawCounts 3
=== time: 21, <default>$depth#94: drawCounts 195
=== time: 5, <default>$depth#95: drawCounts 12
=== time: 6, <default>$depth#96: drawCounts 36
=== time: 7, <default>$depth#97: drawCounts 54
=== time: 423, <default>$depth#98: drawCounts 66
=== time: 782, <default>$depth#99: drawCounts 885
=== time: 1841, <default>$depth#100: drawCounts 444
=== time: 11, <default>$depth#101: drawCounts 12
=== time: 62, <default>$depth#102: drawCounts 540
=== time: 1427, <default>$depth#103: drawCounts 12
=== time: 1597, <default>$depth#104: drawCounts 12
=== time: 58, <default>$depth#105: drawCounts 450
=== time: 1226, <default>$depth#106: drawCounts 12
=== time: 13, <default>$depth#107: drawCounts 6
=== time: 1839, <default>$depth#108: drawCounts 276
=== time: 20, <default>$depth#109: drawCounts 237
=== time: 680, <default>$depth#110: drawCounts 435
=== time: 17, <default>$depth#111: drawCounts 84
=== time: 942, <default>$depth#112: drawCounts 36
=== time: 23, <default>$depth#113: drawCounts 297
=== time: 6, <default>$depth#114: drawCounts 9
=== time: 6, <default>$depth#115: drawCounts 15
=== time: 5, <default>$depth#116: drawCounts 6
=== time: 5, <default>$depth#117: drawCounts 12
=== time: 32, <default>$depth#118: drawCounts 66
=== time: 2252, <default>$depth#119: drawCounts 126
=== time: 1397, <default>$depth#120: drawCounts 54
=== time: 1928, <default>$depth#121: drawCounts 21
=== time: 13, <default>$depth#122: drawCounts 12
=== time: 1170, <default>$depth#123: drawCounts 360
=== time: 1028, <default>$depth#124: drawCounts 564
=== time: 1630, <default>$depth#125: drawCounts 6
=== time: 14, <default>$depth#126: drawCounts 36
=== time: 969, <default>$depth#127: drawCounts 6
=== time: 18, <default>$depth#128: drawCounts 96
=== time: 1256, <default>$depth#129: drawCounts 1536
=== time: 1281, <default>$depth#130: drawCounts 24
=== time: 765, <default>$depth#131: drawCounts 936
=== time: 1176, <default>$depth#132: drawCounts 18
=== time: 12, <default>$depth#133: drawCounts 36
=== time: 1589, <default>$depth#134: drawCounts 24
=== time: 14, <default>$depth#135: drawCounts 63
=== time: 690, <default>$depth#136: drawCounts 723
=== time: 725, <default>$depth#137: drawCounts 96
=== time: 1645, <default>$depth#138: drawCounts 195
=== time: 16, <default>$depth#139: drawCounts 27
=== time: 5, <default>$depth#140: drawCounts 6
=== time: 6, <default>$depth#141: drawCounts 12
=== time: 5, <default>$depth#142: drawCounts 12
=== time: 6, <default>$depth#143: drawCounts 66
=== time: 6, <default>$depth#144: drawCounts 36
=== time: 5, <default>$depth#145: drawCounts 6
=== time: 1635, <default>$depth#146: drawCounts 12
=== time: 30, <default>$depth#147: drawCounts 318
=== time: 6, <default>$depth#148: drawCounts 12
=== time: 5, <default>$depth#149: drawCounts 6
=== time: 7, <default>$depth#150: drawCounts 48
=== time: 1575, <default>$depth#151: drawCounts 36
=== time: 1308, <default>$depth#152: drawCounts 168
=== time: 775, <default>$depth#153: drawCounts 423
=== time: 610, <default>$depth#154: drawCounts 612
=== time: 18, <default>$depth#155: drawCounts 240
=== time: 1213, <default>$depth#156: drawCounts 42
=== time: 21, <default>$depth#157: drawCounts 192
=== time: 1430, <default>$depth#158: drawCounts 12
=== time: 1359, <default>$depth#159: drawCounts 192
=== time: 843, <default>$depth#160: drawCounts 126
=== time: 635, <default>$depth#161: drawCounts 9
=== time: 1314, <default>$depth#162: drawCounts 6
=== time: 11, <default>$depth#163: drawCounts 36
=== time: 1707, <default>$depth#164: drawCounts 12
=== time: 13, <default>$depth#165: drawCounts 12
=== time: 6, <default>$depth#166: drawCounts 12
=== time: 6, <default>$depth#167: drawCounts 12
=== time: 5, <default>$depth#168: drawCounts 12
=== time: 5, <default>$depth#169: drawCounts 24
=== time: 10, <default>$depth#170: drawCounts 99
=== time: 6, <default>$depth#171: drawCounts 15
=== time: 5, <default>$depth#172: drawCounts 6
=== time: 5, <default>$depth#173: drawCounts 12
=== time: 5, <default>$depth#174: drawCounts 12
=== time: 5, <default>$depth#175: drawCounts 6
=== time: 5, <default>$depth#176: drawCounts 6
=== time: 6, <default>$depth#177: drawCounts 12
=== time: 5, <default>$depth#178: drawCounts 24
=== time: 7, <default>$depth#179: drawCounts 60
=== time: 5, <default>$depth#180: drawCounts 18
=== time: 6, <default>$depth#181: drawCounts 24
=== time: 1089, <default>$depth#182: drawCounts 192
=== time: 730, <default>$depth#183: drawCounts 99
=== time: 725, <default>$depth#184: drawCounts 153
=== time: 1212, <default>$depth#185: drawCounts 18
=== time: 592, <default>$depth#186: drawCounts 363
=== time: 807, <default>$depth#187: drawCounts 12
=== time: 2288, <default>$depth#188: drawCounts 99
=== time: 1866, <default>$depth#189: drawCounts 12
=== time: 12, <default>$depth#190: drawCounts 18
=== time: 589, <default>$depth#191: drawCounts 429
=== time: 1251, <default>$depth#192: drawCounts 516
=== time: 1072, <default>$depth#193: drawCounts 870
=== time: 737, <default>$depth#194: drawCounts 636
=== time: 12, <default>$depth#195: drawCounts 15
=== time: 6, <default>$depth#196: drawCounts 9
=== time: 6, <default>$depth#197: drawCounts 18
=== time: 5, <default>$depth#198: drawCounts 12
=== time: 1223, <default>$depth#199: drawCounts 165
=== time: 1251, <default>$depth#200: drawCounts 72
=== time: 15, <default>$depth#201: drawCounts 12
=== time: 1825, <default>$depth#202: drawCounts 27
=== time: 1123, <default>$depth#203: drawCounts 762
=== time: 1129, <default>$depth#204: drawCounts 1548
=== time: 1337, <default>$depth#205: drawCounts 30
=== time: 1926, <default>$depth#206: drawCounts 12
=== time: 14, <default>$depth#207: drawCounts 36
=== time: 1919, <default>$depth#208: drawCounts 276
=== time: 17, <default>$depth#209: drawCounts 129
=== time: 749, <default>$depth#210: drawCounts 675
=== time: 1437, <default>$depth#211: drawCounts 48
=== time: 891, <default>$depth#212: drawCounts 615
=== time: 13, <default>$depth#213: drawCounts 21
=== time: 1445, <default>$depth#214: drawCounts 12
=== time: 14, <default>$depth#215: drawCounts 48
=== time: 7, <default>$depth#216: drawCounts 36
=== time: 1202, <default>$depth#217: drawCounts 270
=== time: 1425, <default>$depth#218: drawCounts 21
=== time: 11, <default>$depth#219: drawCounts 12
=== time: 8, <default>$depth#220: drawCounts 72
=== time: 646, <default>$depth#221: drawCounts 582
=== time: 1459, <default>$depth#222: drawCounts 36
=== time: 1421, <default>$depth#223: drawCounts 54
=== time: 35, <default>$depth#224: drawCounts 378
=== time: 691, <default>$depth#225: drawCounts 192
=== time: 1594, <default>$depth#226: drawCounts 18
=== time: 13, <default>$depth#227: drawCounts 12
=== time: 588, <default>$depth#228: drawCounts 60
=== time: 1202, <default>$depth#229: drawCounts 174
=== time: 1413, <default>$depth#230: drawCounts 78
=== time: 1821, <default>$depth#231: drawCounts 12
=== time: 12, <default>$depth#232: drawCounts 12
=== time: 10, <default>$depth#233: drawCounts 102
=== time: 5, <default>$depth#234: drawCounts 12
=== time: 6, <default>$depth#235: drawCounts 12
=== time: 643, <default>$depth#236: drawCounts 117
=== time: 1456, <default>$depth#237: drawCounts 12
=== time: 995, <default>$depth#238: drawCounts 174
=== time: 15, <default>$depth#239: drawCounts 9
=== time: 6, <default>$depth#240: drawCounts 36
=== time: 6, <default>$depth#241: drawCounts 12
=== time: 19, <default>$depth#242: drawCounts 264
=== time: 2677, models/mapobjects/vega/globe/globe$depth#0: drawCounts 1260
=== time: 67, textures/shared_vega/light01_300#0: drawCounts 24
=== time: 6, textures/shared_vega/light01_300#1: drawCounts 18
=== time: 18, textures/shared_vega/panel01#0: drawCounts 114
=== time: 234, textures/shared_vega/panel01#1: drawCounts 15
=== time: 13, textures/shared_vega/panel01#2: drawCounts 36
=== time: 5, textures/shared_vega/panel01#3: drawCounts 21
=== time: 5, textures/shared_vega/panel01#4: drawCounts 15
=== time: 5, textures/shared_vega/panel01#5: drawCounts 9
=== time: 6, textures/shared_vega/panel01#6: drawCounts 21
=== time: 5, textures/shared_vega/panel01#7: drawCounts 9
=== time: 27, textures/shared_vega/trim07#0: drawCounts 54
=== time: 7, textures/shared_vega/trim07#1: drawCounts 36
=== time: 8, textures/shared_vega/trim07#2: drawCounts 72
=== time: 13, textures/shared_vega/trim01#0: drawCounts 78
=== time: 5, textures/shared_vega/trim01#1: drawCounts 12
=== time: 30, textures/shared_vega/trim01#2: drawCounts 18
=== time: 5, textures/shared_vega/trim01#3: drawCounts 12
=== time: 5, textures/shared_vega/trim01#4: drawCounts 36
=== time: 5, textures/shared_vega/trim01#5: drawCounts 12
=== time: 57, textures/shared_vega/panel07#0: drawCounts 144
=== time: 12, textures/shared_vega/panel07#1: drawCounts 144
=== time: 56, textures/shared_vega/base01#0: drawCounts 486
=== time: 48, textures/shared_vega/base01#1: drawCounts 552
=== time: 15, textures/shared_vega/base01#2: drawCounts 255
=== time: 42, textures/shared_vega/base01#3: drawCounts 1281
=== time: 11, textures/shared_vega/base01#4: drawCounts 96
=== time: 22, textures/shared_vega/base01#5: drawCounts 363
=== time: 30, textures/shared_vega/base01#6: drawCounts 399
=== time: 40, textures/shared_vega/base01#7: drawCounts 1110
=== time: 7, textures/shared_vega/base01#8: drawCounts 48
=== time: 26, textures/shared_vega/base01#9: drawCounts 255
=== time: 62, textures/shared_vega/base01#10: drawCounts 1401
=== time: 53, textures/shared_vega/base01#11: drawCounts 465
=== time: 6, textures/shared_vega/base01#12: drawCounts 18
=== time: 9, textures/shared_vega/base01#13: drawCounts 117
=== time: 29, textures/shared_vega/base01#14: drawCounts 102
=== time: 32, textures/shared_vega/base01#15: drawCounts 531
=== time: 13, textures/shared_vega/base01#16: drawCounts 237
=== time: 49, textures/shared_vega/base01#17: drawCounts 726
=== time: 14, textures/shared_vega/base01#18: drawCounts 309
=== time: 7, textures/shared_vega/base01#19: drawCounts 66
=== time: 14, textures/shared_vega/base01#20: drawCounts 237
=== time: 24, textures/shared_vega/base01#21: drawCounts 435
=== time: 22, textures/shared_vega/base01#22: drawCounts 84
=== time: 10, textures/shared_vega/base01#23: drawCounts 96
=== time: 7, textures/shared_vega/base01#24: drawCounts 63
=== time: 55, textures/shared_vega/base01#25: drawCounts 723
=== time: 6, textures/shared_vega/base01#26: drawCounts 6
=== time: 33, textures/shared_vega/base01#27: drawCounts 612
=== time: 22, textures/shared_vega/base01#28: drawCounts 240
=== time: 9, textures/shared_vega/base01#29: drawCounts 99
=== time: 5, textures/shared_vega/base01#30: drawCounts 18
=== time: 6, textures/shared_vega/base01#31: drawCounts 24
=== time: 39, textures/shared_vega/base01#32: drawCounts 363
=== time: 24, textures/shared_vega/base01#33: drawCounts 429
=== time: 5, textures/shared_vega/base01#34: drawCounts 12
=== time: 9, textures/shared_vega/base01#35: drawCounts 129
=== time: 49, textures/shared_vega/base01#36: drawCounts 675
=== time: 34, textures/shared_vega/base01#37: drawCounts 582
=== time: 23, textures/shared_vega/base01#38: drawCounts 12
=== time: 9, textures/shared_vega/base01#39: drawCounts 117
=== time: 787, textures/shared_vega/base03a#0: drawCounts 786
=== time: 923, textures/shared_vega/base03a#1: drawCounts 2706
=== time: 724, textures/shared_vega/base03a#2: drawCounts 1233
=== time: 778, textures/shared_vega/base03a#3: drawCounts 519
=== time: 15, textures/shared_vega/base03a#4: drawCounts 99
=== time: 12, textures/shared_vega/base03a#5: drawCounts 153
=== time: 768, textures/shared_vega/base03a#6: drawCounts 636
=== time: 700, textures/shared_vega/base03a#7: drawCounts 615
=== time: 448, textures/shared_vega/base03a#8: drawCounts 378
=== time: 23, textures/shared_vega/base03a#9: drawCounts 192
=== time: 1326, textures/shared_vega/base01a#0: drawCounts 2850
=== time: 1125, textures/shared_vega/base01a#1: drawCounts 2100
=== time: 1166, textures/shared_vega/base01a#2: drawCounts 42
=== time: 20, textures/shared_vega/base01a#3: drawCounts 192
=== time: 1100, textures/shared_vega/base01a#4: drawCounts 192
=== time: 1117, textures/shared_vega/base01a#5: drawCounts 516
=== time: 1105, textures/shared_vega/base01a#6: drawCounts 870
=== time: 1126, textures/shared_vega/base01a#7: drawCounts 762
=== time: 1112, textures/shared_vega/base01a#8: drawCounts 1548
=== time: 592, textures/shared_vega/base01a#9: drawCounts 102
=== time: 2214, textures/shared_vega/trim03a#0: drawCounts 390
=== time: 598, textures/shared_vega/trim03a#1: drawCounts 90
=== time: 1346, textures/shared_vega/trim03a#2: drawCounts 246
=== time: 11, textures/shared_vega/trim03a#3: drawCounts 12
=== time: 1269, textures/shared_vega/trim03a#4: drawCounts 555
=== time: 1281, textures/shared_vega/trim03a#5: drawCounts 24
=== time: 14, textures/shared_vega/trim03a#6: drawCounts 72
=== time: 15, textures/shared_vega/trim03a#7: drawCounts 234
=== time: 7, textures/shared_vega/trim03a#8: drawCounts 66
=== time: 1204, textures/shared_vega/trim03a#9: drawCounts 126
=== time: 1255, textures/shared_vega/trim03a#10: drawCounts 36
=== time: 12, textures/shared_vega/trim03a#11: drawCounts 18
=== time: 1372, textures/shared_vega/trim03a#12: drawCounts 345
=== time: 12, textures/shared_vega/trim03a#13: drawCounts 12
=== time: 7, textures/shared_vega/trim03a#14: drawCounts 36
=== time: 6, textures/shared_vega/trim03a#15: drawCounts 18
=== time: 9, textures/shared_vega/trim03a#16: drawCounts 99
=== time: 11, textures/shared_vega/trim03a#17: drawCounts 165
=== time: 551, textures/shared_vega/trim03a#18: drawCounts 270
=== time: 24, textures/shared_vega/trim03a#19: drawCounts 174
=== time: 1244, textures/shared_vega/trim03a#20: drawCounts 264
=== time: 1439, textures/shared_vega/base03#0: drawCounts 642
=== time: 1341, textures/shared_vega/base03#1: drawCounts 816
=== time: 1340, textures/shared_vega/base03#2: drawCounts 12
=== time: 1412, textures/shared_vega/base03#3: drawCounts 396
=== time: 1379, textures/shared_vega/trim02#0: drawCounts 30
=== time: 1608, textures/shared_vega/panel01a#0: drawCounts 12
=== time: 2835, textures/shared_vega/trim04#0: drawCounts 6
=== time: 16, textures/shared_vega/trim04#1: drawCounts 78
=== time: 1423, textures/shared_vega/trim04#2: drawCounts 66
=== time: 65, textures/shared_vega/trim04#3: drawCounts 108
=== time: 5, textures/shared_vega/trim04#4: drawCounts 6
=== time: 7, textures/shared_vega/trim04#5: drawCounts 54
=== time: 4, textures/shared_vega/trim04#6: drawCounts 6
=== time: 3003, textures/shared_vega/trim04#7: drawCounts 66
=== time: 4734, textures/shared_vega/trim04#8: drawCounts 72
=== time: 18, textures/shared_vega/trim04#9: drawCounts 21
=== time: 4, textures/shared_vega/trim04#10: drawCounts 12
=== time: 7, textures/shared_vega/trim04#11: drawCounts 78
=== time: 45, textures/shared_vega/base02#0: drawCounts 72
=== time: 18, textures/shared_vega/base02#1: drawCounts 138
=== time: 4, textures/shared_vega/base02#2: drawCounts 12
=== time: 13, textures/shared_vega/base02#3: drawCounts 54
=== time: 15, textures/shared_vega/base02#4: drawCounts 84
=== time: 13, textures/shared_vega/base02#5: drawCounts 54
=== time: 5, textures/shared_vega/base02#6: drawCounts 12
=== time: 19, textures/shared_vega/trim02a#0: drawCounts 18
=== time: 19, textures/shared_vega/panel06_300#0: drawCounts 30
=== time: 7, textures/shared_vega/panel06_300#1: drawCounts 36
=== time: 5, textures/shared_vega/panel06_300#2: drawCounts 36
=== time: 30, textures/shared_vega/trim03#0: drawCounts 240
=== time: 11, textures/shared_vega/trim03#1: drawCounts 120
=== time: 9, textures/shared_vega/trim03#2: drawCounts 108
=== time: 27, textures/shared_vega/trim03#3: drawCounts 72
=== time: 21, textures/shared_vega/trim03b#0: drawCounts 78
=== time: 35, textures/shared_vega/concrete01#0: drawCounts 633
=== time: 7, textures/shared_vega/concrete01#1: drawCounts 84
=== time: 7, textures/shared_vega/concrete01#2: drawCounts 75
=== time: 7, textures/shared_vega/concrete01#3: drawCounts 84
=== time: 15, textures/shared_vega/concrete01#4: drawCounts 303
=== time: 6, textures/shared_vega/concrete01#5: drawCounts 48
=== time: 12, textures/shared_vega/concrete01#6: drawCounts 24
=== time: 12, textures/shared_vega/concrete01#7: drawCounts 33
=== time: 8, textures/shared_vega/concrete01#8: drawCounts 126
=== time: 11, textures/shared_vega/concrete01#9: drawCounts 48
=== time: 20, textures/shared_vega/concrete01#10: drawCounts 195
=== time: 12, textures/shared_vega/concrete01#11: drawCounts 207
=== time: 6, textures/shared_vega/concrete01#12: drawCounts 18
=== time: 10, textures/shared_vega/concrete01#13: drawCounts 174
=== time: 24, textures/shared_vega/trim01a#0: drawCounts 84
=== time: 31, textures/shared_vega/trim01a#1: drawCounts 12
=== time: 7, textures/shared_vega/trim01a#2: drawCounts 72
=== time: 83, models/mapobjects/vega/globe/globe#0: drawCounts 1260
=== time: 1734, models/mapobjects/vega/container/container2#0: drawCounts 276
=== time: 38, models/mapobjects/vega/container/container2#1: drawCounts 276
=== time: 28, textures/shared_vega/floor02#0: drawCounts 36
=== time: 1577, textures/shared_vega/floor02#1: drawCounts 36
=== time: 25, textures/shared_vega/floor02#2: drawCounts 18
=== time: 10, textures/shared_vega/floor02#3: drawCounts 36
=== time: 1851, textures/shared_vega/floor01#0: drawCounts 51
=== time: 34, textures/shared_vega/floor01#1: drawCounts 195
=== time: 9, textures/shared_vega/floor01#2: drawCounts 27
=== time: 15, textures/shared_vega/floor01#3: drawCounts 12
=== time: 10, textures/shared_vega/floor01#4: drawCounts 27
=== time: 14, textures/shared_vega/floor01#5: drawCounts 24
=== time: 9, textures/shared_vega/floor01#6: drawCounts 12
=== time: 1618, textures/shared_vega/dirt01#0: drawCounts 6
=== time: 25, textures/shared_vega/dirt01#1: drawCounts 6
=== time: 824, textures/shared_vega/light03_1500#0: drawCounts 36
=== time: 3418, textures/shared_vega/light03_1500#1: drawCounts 30
=== time: 1806, textures/shared_vega/panel05_300#0: drawCounts 72
=== time: 19, textures/shared_vega/panel05_300#1: drawCounts 18
=== time: 26, textures/shared_vega/panel02#0: drawCounts 18
=== time: 18, textures/shared_vega/asphalt01#0: drawCounts 36
=== time: 38, models/mapobjects/vega/monitor/display_door_red#0: drawCounts 6
=== time: 16, models/mapobjects/vega/monitor/display_door_blue#0: drawCounts 6
=== time: 1847, models/mapobjects/vega/monitor/monitor#0: drawCounts 204
=== time: 45, models/mapobjects/vega/emitter/emitter#0: drawCounts 240
=== time: 1750, textures/vega_custom/starchart02#0: drawCounts 6
=== time: 66, textures/vega_custom/roadstrip#0: drawCounts 12
=== time: 30, textures/vega_custom/wheeltracks#0: drawCounts 72
=== time: 28, textures/vega_custom/vega_banner#0: drawCounts 48
=== time: 26, textures/shared_vega/grate01#0: drawCounts 90
=== time: 1560, textures/shared_vega/grate01#1: drawCounts 60
=== time: 28, textures/shared_vega/grate01#2: drawCounts 60
=== time: 58, textures/vega_custom/holo_projector#0: drawCounts 48
=== time: 25, textures/shared_vega/glass01#0: drawCounts 12
=== time: 3, textures/shared_vega/glass01#1: drawCounts 6
=== time: 9, textures/shared_vega/glass01#2: drawCounts 36
=== time: 4, textures/shared_vega/glass01#3: drawCounts 18
=== time: 5, textures/shared_vega/glass01#4: drawCounts 12
=== time: 8, textures/shared_vega/glass01#5: drawCounts 6
=== time: 4, textures/shared_vega/glass01#6: drawCounts 126
=== time: 28, textures/shared_vega/glass01#0: drawCounts 12
=== time: 6, textures/shared_vega/glass01#1: drawCounts 6
=== time: 18, textures/shared_vega/glass01#2: drawCounts 36
=== time: 9, textures/shared_vega/glass01#3: drawCounts 18
=== time: 13, textures/shared_vega/glass01#4: drawCounts 12
=== time: 9, textures/shared_vega/glass01#5: drawCounts 6
=== time: 1490, textures/shared_vega/glass01#6: drawCounts 126
=== time: 103, textures/shared_vega/glass01#0: drawCounts 12
=== time: 5, textures/shared_vega/glass01#1: drawCounts 6
=== time: 16, textures/shared_vega/glass01#2: drawCounts 36
=== time: 9, textures/shared_vega/glass01#3: drawCounts 18
=== time: 12, textures/shared_vega/glass01#4: drawCounts 12
=== time: 9, textures/shared_vega/glass01#5: drawCounts 6
=== time: 17, textures/shared_vega/glass01#6: drawCounts 126
=== time: 52, textures/shared_vega/glass01#0: drawCounts 6

The time is not even related to draw counts:

=== time: 21, <default>$depth#157: drawCounts 192
=== time: 1430, <default>$depth#158: drawCounts 12
=== time: 1359, <default>$depth#159: drawCounts 192
=== time: 843, <default>$depth#160: drawCounts 126
=== time: 635, <default>$depth#161: drawCounts 9
=== time: 1314, <default>$depth#162: drawCounts 6
=== time: 11, <default>$depth#163: drawCounts 36
=== time: 1707, <default>$depth#164: drawCounts 12
=== time: 13, <default>$depth#165: drawCounts 12

@illwieckz
Copy link
Member Author

illwieckz commented Jun 3, 2024

As a remind this is the call trace:

Thread 1 (Thread 0x74bd4686ca80 (LWP 519073) "daemon"):
#0  __GI___ioctl (fd=fd@entry=15, request=request@entry=3223348317) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x000074bd47ef00a0 in drmIoctl (fd=15, request=request@entry=3223348317, arg=arg@entry=0x7fff5a18a700) at ../xf86drm.c:704
#2  0x000074bd47ef3260 in drmCommandWriteRead (fd=<optimized out>, drmCommandIndex=drmCommandIndex@entry=29, data=data@entry=0x7fff5a18a700, size=size@entry=32) at ../xf86drm.c:3186
#3  0x000074bd23076013 in radeon_create_bo (rws=rws@entry=0x5f49b6c500e0, size=size@entry=4370432, alignment=alignment@entry=4096, initial_domains=initial_domains@entry=2, flags=flags@entry=16, heap=heap@entry=0) at ../src/gallium/winsys/radeon/drm/radeon_drm_bo.c:632
#4  0x000074bd230767c5 in radeon_winsys_bo_create (rws=0x5f49b6c500e0, size=4370432, alignment=<optimized out>, domain=<optimized out>, flags=<optimized out>) at ../src/gallium/winsys/radeon/drm/radeon_drm_bo.c:1043
#5  0x000074bd2303a5e4 in r300_buffer_create (screen=0x5f49b6c51050, templ=0x7fff5a18a830) at ../src/gallium/drivers/r300/r300_screen_buffer.c:191
#6  0x000074bd22f036a3 in u_upload_alloc_buffer (min_size=4366992, upload=0x5f49b6cfe660) at ../src/gallium/auxiliary/util/u_upload_mgr.c:208
#7  u_upload_alloc (upload=0x5f49b6cfe660, min_out_offset=<optimized out>, size=2816, alignment=alignment@entry=4, out_offset=out_offset@entry=0x7fff5a18a974, outbuf=outbuf@entry=0x7fff5a18a978, ptr=0x7fff5a18a980) at ../src/gallium/auxiliary/util/u_upload_mgr.c:273
#8  0x000074bd22f04ca3 in u_vbuf_translate_buffers (unroll_indices=false, min_index=<optimized out>, num_vertices=176, start_vertex=272761, out_vb=1, vb_mask=<optimized out>, draw=<optimized out>, info=<optimized out>, key=<optimized out>, mgr=0x5f49b6d937a0) at ../src/gallium/auxiliary/util/u_vbuf.c:587
#9  u_vbuf_translate_begin (mgr=mgr@entry=0x5f49b6d937a0, info=info@entry=0x7fff5a18b830, draw=draw@entry=0x7fff5a18b814, start_vertex=start_vertex@entry=272761, num_vertices=num_vertices@entry=176, min_index=<optimized out>, unroll_indices=<optimized out>, misaligned=<optimized out>) at ../src/gallium/auxiliary/util/u_vbuf.c:805
#10 0x000074bd22f07120 in u_vbuf_draw_vbo (pipe=0x5f49b6cad050, info=0x7fff5a18b9e0, drawid_offset=0, indirect=0x0, draws=<optimized out>, num_draws=1) at ../src/gallium/auxiliary/util/u_vbuf.c:1726
#11 0x000074bd22b2c18d in _mesa_validated_drawrangeelements (ctx=ctx@entry=0x5f49b6caf170, index_bo=<optimized out>, mode=mode@entry=4, index_bounds_valid=index_bounds_valid@entry=false, start=start@entry=0, end=end@entry=4294967295, count=276, type=5125, indices=0x2353c8, basevertex=0, numInstances=1, baseInstance=0) at ../src/mesa/main/draw.c:1734
#12 0x000074bd22b2d689 in _mesa_DrawElementsBaseVertex (mode=4, count=276, type=5125, indices=0x2353c8, basevertex=0) at ../src/mesa/main/draw.c:1877
#13 0x00005f49b54c70c2 in Tess_DrawElements () at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:337
#14 0x00005f49b54c8344 in Render_generic3D (pStage=0x74bd44f148c0) at Unvanquished/daemon/src/engine/renderer/tr_shade.cpp:802

From u_vbuf_draw_vbo in mesa/src/gallium/auxiliary/util/u_vbuf.c:1726:

	      /* Translate vertices with non-native layouts or formats. */
	      if (unroll_indices ||
	          incompatible_vb_mask ||
	          mgr->ve->incompatible_elem_mask) {
	         if (!u_vbuf_translate_begin(mgr, &new_info, &new_draw,
	                                     start_vertex, num_vertices,
	                                     min_index, unroll_indices, misaligned)) {

@illwieckz
Copy link
Member Author

illwieckz commented Jun 3, 2024

I did this:

--- a/src/engine/renderer/tr_backend.cpp
+++ b/src/engine/renderer/tr_backend.cpp
@@ -666,6 +666,7 @@ void GL_VertexAttribsState( uint32_t stateBits )
 	glState.vertexAttribsState = stateBits;
 }
 
+#include <cstdio>
 void GL_VertexAttribPointers( uint32_t attribBits )
 {
 	uint32_t i;
@@ -720,6 +721,7 @@ void GL_VertexAttribPointers( uint32_t attribBits )
 				frame = glState.vertexAttribsOldFrame;
 			}
 
+			printf("=== layout numComponents: %d, componentType: %#x, normalize: %d, stride: %d\n", layout->numComponents, layout->componentType, layout->normalize, layout->stride );
 			glVertexAttribPointer( i, layout->numComponents, layout->componentType, layout->normalize, layout->stride, BUFFER_OFFSET( layout->ofs + ( frame * layout->frameOffset + base ) ) );
 			glState.vertexAttribPointersSet |= bit;
 		}
@@ -4789,6 +4791,7 @@ static void RB_RenderView( bool depthPass )
 	}

I got this:

=== layout numComponents: 3, componentType: 0x1406, normalize: 0, stride: 32
=== layout numComponents: 4, componentType: 0x140b, normalize: 0, stride: 32
=== layout numComponents: 4, componentType: 0x1402, normalize: 1, stride: 32
=== time: 144, <default>$depth#0, drawCounts: 633
=== time: 46, <default>$depth#1, drawCounts: 486
=== time: 40, <default>$depth#2, drawCounts: 552
=== time: 22, <default>$depth#3, drawCounts: 255
=== time: 26, <default>$depth#4, drawCounts: 324
=== time: 25, <default>$depth#5, drawCounts: 84
=== time: 23, <default>$depth#6, drawCounts: 36
=== time: 32, <default>$depth#7, drawCounts: 240
=== time: 20, <default>$depth#8, drawCounts: 120
=== time: 29, <default>$depth#9, drawCounts: 390
=== time: 23, <default>$depth#10, drawCounts: 54
=== time: 58, <default>$depth#11, drawCounts: 1281
=== time: 23, <default>$depth#12, drawCounts: 48
=== time: 10, <default>$depth#13, drawCounts: 6
=== time: 20, <default>$depth#14, drawCounts: 108
=== time: 10, <default>$depth#15, drawCounts: 6
=== time: 25, <default>$depth#16, drawCounts: 96
=== time: 30, <default>$depth#17, drawCounts: 363
=== time: 29, <default>$depth#18, drawCounts: 399
=== time: 48, <default>$depth#19, drawCounts: 1110
=== time: 27, <default>$depth#20, drawCounts: 318
=== time: 22, <default>$depth#21, drawCounts: 75
=== time: 11, <default>$depth#22, drawCounts: 84
=== time: 27, <default>$depth#23, drawCounts: 24
=== time: 6, <default>$depth#24, drawCounts: 24
=== time: 12, <default>$depth#25, drawCounts: 48
=== time: 33, <default>$depth#26, drawCounts: 303
=== time: 11, <default>$depth#27, drawCounts: 12
=== time: 19, <default>$depth#28, drawCounts: 6
=== time: 12, <default>$depth#29, drawCounts: 72
=== time: 16, <default>$depth#30, drawCounts: 90
=== time: 91, <default>$depth#31, drawCounts: 78
=== time: 32, <default>$depth#32, drawCounts: 255
=== time: 69, <default>$depth#33, drawCounts: 1401
=== time: 178, <default>$depth#34, drawCounts: 465
=== time: 52, <default>$depth#35, drawCounts: 732
=== time: 20, <default>$depth#36, drawCounts: 12
=== time: 20, <default>$depth#37, drawCounts: 48
=== time: 12, <default>$depth#38, drawCounts: 42
=== time: 13, <default>$depth#39, drawCounts: 6
=== time: 18, <default>$depth#40, drawCounts: 12
=== time: 32, <default>$depth#41, drawCounts: 246
=== time: 10, <default>$depth#42, drawCounts: 12
=== time: 40, <default>$depth#43, drawCounts: 6
=== time: 34, <default>$depth#44, drawCounts: 60
=== time: 19, <default>$depth#45, drawCounts: 42
=== time: 11, <default>$depth#46, drawCounts: 18
=== time: 16, <default>$depth#47, drawCounts: 54
=== time: 20, <default>$depth#48, drawCounts: 24
=== time: 21, <default>$depth#49, drawCounts: 33
=== time: 10, <default>$depth#50, drawCounts: 18
=== time: 15, <default>$depth#51, drawCounts: 18
=== time: 46, <default>$depth#52, drawCounts: 555
=== time: 21, <default>$depth#53, drawCounts: 48
=== time: 22, <default>$depth#54, drawCounts: 48
=== time: 11, <default>$depth#55, drawCounts: 24
=== time: 14, <default>$depth#56, drawCounts: 18
=== time: 11, <default>$depth#57, drawCounts: 24
=== time: 14, <default>$depth#58, drawCounts: 18
=== time: 10, <default>$depth#59, drawCounts: 6
=== time: 18, <default>$depth#60, drawCounts: 117
=== time: 17, <default>$depth#61, drawCounts: 102
=== time: 24, <default>$depth#62, drawCounts: 72
=== time: 24, <default>$depth#63, drawCounts: 126
=== time: 20, <default>$depth#64, drawCounts: 12
=== time: 19, <default>$depth#65, drawCounts: 6
=== time: 10, <default>$depth#66, drawCounts: 12
=== time: 20, <default>$depth#67, drawCounts: 24
=== time: 8, <default>$depth#68, drawCounts: 72
=== time: 23, <default>$depth#69, drawCounts: 12
=== time: 22, <default>$depth#70, drawCounts: 48
=== time: 22, <default>$depth#71, drawCounts: 216
=== time: 7, <default>$depth#72, drawCounts: 36
=== time: 134, <default>$depth#73, drawCounts: 1713
=== time: 31, <default>$depth#74, drawCounts: 12
=== time: 18, <default>$depth#75, drawCounts: 6
=== time: 48, <default>$depth#76, drawCounts: 531
=== time: 24, <default>$depth#77, drawCounts: 237
=== time: 35, <default>$depth#78, drawCounts: 726
=== time: 20, <default>$depth#79, drawCounts: 309
=== time: 124, <default>$depth#80, drawCounts: 2040
=== time: 25, <default>$depth#81, drawCounts: 138
=== time: 7, <default>$depth#82, drawCounts: 9
=== time: 12, <default>$depth#83, drawCounts: 6
=== time: 1860, <default>$depth#84, drawCounts: 18
=== time: 1743, <default>$depth#85, drawCounts: 18
=== time: 23, <default>$depth#86, drawCounts: 12
=== time: 12, <default>$depth#87, drawCounts: 72
=== time: 13, <default>$depth#88, drawCounts: 12
=== time: 14, <default>$depth#89, drawCounts: 66
=== time: 23, <default>$depth#90, drawCounts: 234
=== time: 26, <default>$depth#91, drawCounts: 96
=== time: 6, <default>$depth#92, drawCounts: 6
=== time: 1054, <default>$depth#93, drawCounts: 3
=== time: 32, <default>$depth#94, drawCounts: 195
=== time: 7, <default>$depth#95, drawCounts: 12
=== time: 14, <default>$depth#96, drawCounts: 36
=== time: 13, <default>$depth#97, drawCounts: 54
=== time: 30, <default>$depth#98, drawCounts: 66
=== time: 56, <default>$depth#99, drawCounts: 885
=== time: 1815, <default>$depth#100, drawCounts: 444
=== time: 23, <default>$depth#101, drawCounts: 12
=== time: 80, <default>$depth#102, drawCounts: 540
=== time: 1387, <default>$depth#103, drawCounts: 12
=== time: 1557, <default>$depth#104, drawCounts: 12
=== time: 87, <default>$depth#105, drawCounts: 450
=== time: 1433, <default>$depth#106, drawCounts: 12
=== time: 17, <default>$depth#107, drawCounts: 6
=== time: 2939, <default>$depth#108, drawCounts: 276
=== time: 21, <default>$depth#109, drawCounts: 237
=== time: 634, <default>$depth#110, drawCounts: 435
=== time: 27, <default>$depth#111, drawCounts: 84
=== time: 743, <default>$depth#112, drawCounts: 36
=== time: 33, <default>$depth#113, drawCounts: 297
=== time: 9, <default>$depth#114, drawCounts: 9
=== time: 13, <default>$depth#115, drawCounts: 15
=== time: 11, <default>$depth#116, drawCounts: 6
=== time: 14, <default>$depth#117, drawCounts: 12
=== time: 1205, <default>$depth#118, drawCounts: 66
=== time: 1212, <default>$depth#119, drawCounts: 126
=== time: 1384, <default>$depth#120, drawCounts: 54
=== time: 1402, <default>$depth#121, drawCounts: 21
=== time: 25, <default>$depth#122, drawCounts: 12
=== time: 1105, <default>$depth#123, drawCounts: 360
=== time: 1049, <default>$depth#124, drawCounts: 564
=== time: 1667, <default>$depth#125, drawCounts: 6
=== time: 29, <default>$depth#126, drawCounts: 36
=== time: 1804, <default>$depth#127, drawCounts: 6
=== time: 28, <default>$depth#128, drawCounts: 96
=== time: 508, <default>$depth#129, drawCounts: 1536
=== time: 1215, <default>$depth#130, drawCounts: 24
=== time: 785, <default>$depth#131, drawCounts: 936
=== time: 1329, <default>$depth#132, drawCounts: 18
=== time: 25, <default>$depth#133, drawCounts: 36
=== time: 1557, <default>$depth#134, drawCounts: 24
=== time: 26, <default>$depth#135, drawCounts: 63
=== time: 668, <default>$depth#136, drawCounts: 723
=== time: 738, <default>$depth#137, drawCounts: 96
=== time: 1503, <default>$depth#138, drawCounts: 195
=== time: 27, <default>$depth#139, drawCounts: 27
=== time: 10, <default>$depth#140, drawCounts: 6
=== time: 14, <default>$depth#141, drawCounts: 12
=== time: 10, <default>$depth#142, drawCounts: 12
=== time: 16, <default>$depth#143, drawCounts: 66
=== time: 10, <default>$depth#144, drawCounts: 36
=== time: 13, <default>$depth#145, drawCounts: 6
=== time: 1324, <default>$depth#146, drawCounts: 12
=== time: 41, <default>$depth#147, drawCounts: 318
=== time: 6, <default>$depth#148, drawCounts: 12
=== time: 9, <default>$depth#149, drawCounts: 6
=== time: 14, <default>$depth#150, drawCounts: 48
=== time: 1381, <default>$depth#151, drawCounts: 36
=== time: 1453, <default>$depth#152, drawCounts: 168
=== time: 755, <default>$depth#153, drawCounts: 423
=== time: 632, <default>$depth#154, drawCounts: 612
=== time: 37, <default>$depth#155, drawCounts: 240
=== time: 1168, <default>$depth#156, drawCounts: 42
=== time: 26, <default>$depth#157, drawCounts: 192
=== time: 1401, <default>$depth#158, drawCounts: 12
=== time: 1299, <default>$depth#159, drawCounts: 192
=== time: 868, <default>$depth#160, drawCounts: 126
=== time: 628, <default>$depth#161, drawCounts: 9
=== time: 1962, <default>$depth#162, drawCounts: 6
=== time: 13, <default>$depth#163, drawCounts: 36
=== time: 1677, <default>$depth#164, drawCounts: 12
=== time: 13, <default>$depth#165, drawCounts: 12
=== time: 6, <default>$depth#166, drawCounts: 12
=== time: 5, <default>$depth#167, drawCounts: 12
=== time: 6, <default>$depth#168, drawCounts: 12
=== time: 5, <default>$depth#169, drawCounts: 24
=== time: 9, <default>$depth#170, drawCounts: 99
=== time: 5, <default>$depth#171, drawCounts: 15
=== time: 4, <default>$depth#172, drawCounts: 6
=== time: 5, <default>$depth#173, drawCounts: 12
=== time: 5, <default>$depth#174, drawCounts: 12
=== time: 4, <default>$depth#175, drawCounts: 6
=== time: 5, <default>$depth#176, drawCounts: 6
=== time: 5, <default>$depth#177, drawCounts: 12
=== time: 5, <default>$depth#178, drawCounts: 24
=== time: 6, <default>$depth#179, drawCounts: 60
=== time: 6, <default>$depth#180, drawCounts: 18
=== time: 6, <default>$depth#181, drawCounts: 24
=== time: 1037, <default>$depth#182, drawCounts: 192
=== time: 670, <default>$depth#183, drawCounts: 99
=== time: 718, <default>$depth#184, drawCounts: 153
=== time: 1259, <default>$depth#185, drawCounts: 18
=== time: 602, <default>$depth#186, drawCounts: 363
=== time: 1355, <default>$depth#187, drawCounts: 12
=== time: 1259, <default>$depth#188, drawCounts: 99
=== time: 1764, <default>$depth#189, drawCounts: 12
=== time: 12, <default>$depth#190, drawCounts: 18
=== time: 585, <default>$depth#191, drawCounts: 429
=== time: 1110, <default>$depth#192, drawCounts: 516
=== time: 1119, <default>$depth#193, drawCounts: 870
=== time: 723, <default>$depth#194, drawCounts: 636
=== time: 11, <default>$depth#195, drawCounts: 15
=== time: 6, <default>$depth#196, drawCounts: 9
=== time: 5, <default>$depth#197, drawCounts: 18
=== time: 5, <default>$depth#198, drawCounts: 12
=== time: 1173, <default>$depth#199, drawCounts: 165
=== time: 1465, <default>$depth#200, drawCounts: 72
=== time: 13, <default>$depth#201, drawCounts: 12
=== time: 3507, <default>$depth#202, drawCounts: 27
=== time: 1067, <default>$depth#203, drawCounts: 762
=== time: 1176, <default>$depth#204, drawCounts: 1548
=== time: 1300, <default>$depth#205, drawCounts: 30
=== time: 1804, <default>$depth#206, drawCounts: 12
=== time: 13, <default>$depth#207, drawCounts: 36
=== time: 1836, <default>$depth#208, drawCounts: 276
=== time: 15, <default>$depth#209, drawCounts: 129
=== time: 597, <default>$depth#210, drawCounts: 675
=== time: 1340, <default>$depth#211, drawCounts: 48
=== time: 696, <default>$depth#212, drawCounts: 615
=== time: 12, <default>$depth#213, drawCounts: 21
=== time: 1356, <default>$depth#214, drawCounts: 12
=== time: 13, <default>$depth#215, drawCounts: 48
=== time: 6, <default>$depth#216, drawCounts: 36
=== time: 1174, <default>$depth#217, drawCounts: 270
=== time: 1475, <default>$depth#218, drawCounts: 21
=== time: 12, <default>$depth#219, drawCounts: 12
=== time: 8, <default>$depth#220, drawCounts: 72
=== time: 620, <default>$depth#221, drawCounts: 582
=== time: 1353, <default>$depth#222, drawCounts: 36
=== time: 1464, <default>$depth#223, drawCounts: 54
=== time: 29, <default>$depth#224, drawCounts: 378
=== time: 691, <default>$depth#225, drawCounts: 192
=== time: 1245, <default>$depth#226, drawCounts: 18
=== time: 16, <default>$depth#227, drawCounts: 12
=== time: 606, <default>$depth#228, drawCounts: 60
=== time: 2282, <default>$depth#229, drawCounts: 174
=== time: 1363, <default>$depth#230, drawCounts: 78
=== time: 1858, <default>$depth#231, drawCounts: 12
=== time: 12, <default>$depth#232, drawCounts: 12
=== time: 10, <default>$depth#233, drawCounts: 102
=== time: 5, <default>$depth#234, drawCounts: 12
=== time: 6, <default>$depth#235, drawCounts: 12
=== time: 568, <default>$depth#236, drawCounts: 117
=== time: 1332, <default>$depth#237, drawCounts: 12
=== time: 1481, <default>$depth#238, drawCounts: 174
=== time: 11, <default>$depth#239, drawCounts: 9
=== time: 6, <default>$depth#240, drawCounts: 36
=== time: 5, <default>$depth#241, drawCounts: 12
=== time: 85, <default>$depth#242, drawCounts: 264
=== time: 1253, models/mapobjects/vega/globe/globe$depth#0, drawCounts: 1260
=== layout numComponents: 4, componentType: 0x1401, normalize: 1, stride: 32
=== time: 71, textures/shared_vega/light01_300#0, drawCounts: 24
=== time: 6, textures/shared_vega/light01_300#1, drawCounts: 18
=== time: 17, textures/shared_vega/panel01#0, drawCounts: 114
=== time: 589, textures/shared_vega/panel01#1, drawCounts: 15
=== time: 13, textures/shared_vega/panel01#2, drawCounts: 36
=== time: 6, textures/shared_vega/panel01#3, drawCounts: 21
=== time: 6, textures/shared_vega/panel01#4, drawCounts: 15
=== time: 5, textures/shared_vega/panel01#5, drawCounts: 9
=== time: 5, textures/shared_vega/panel01#6, drawCounts: 21
=== time: 5, textures/shared_vega/panel01#7, drawCounts: 9
=== time: 28, textures/shared_vega/trim07#0, drawCounts: 54
=== time: 6, textures/shared_vega/trim07#1, drawCounts: 36
=== time: 8, textures/shared_vega/trim07#2, drawCounts: 72
=== time: 13, textures/shared_vega/trim01#0, drawCounts: 78
=== time: 5, textures/shared_vega/trim01#1, drawCounts: 12
=== time: 5, textures/shared_vega/trim01#2, drawCounts: 18
=== time: 5, textures/shared_vega/trim01#3, drawCounts: 12
=== time: 6, textures/shared_vega/trim01#4, drawCounts: 36
=== time: 5, textures/shared_vega/trim01#5, drawCounts: 12
=== time: 27, textures/shared_vega/panel07#0, drawCounts: 144
=== time: 11, textures/shared_vega/panel07#1, drawCounts: 144
=== time: 46, textures/shared_vega/base01#0, drawCounts: 486
=== time: 43, textures/shared_vega/base01#1, drawCounts: 552
=== time: 14, textures/shared_vega/base01#2, drawCounts: 255
=== time: 39, textures/shared_vega/base01#3, drawCounts: 1281
=== time: 9, textures/shared_vega/base01#4, drawCounts: 96
=== time: 22, textures/shared_vega/base01#5, drawCounts: 363
=== time: 21, textures/shared_vega/base01#6, drawCounts: 399
=== time: 40, textures/shared_vega/base01#7, drawCounts: 1110
=== time: 7, textures/shared_vega/base01#8, drawCounts: 48
=== time: 15, textures/shared_vega/base01#9, drawCounts: 255
=== time: 62, textures/shared_vega/base01#10, drawCounts: 1401
=== time: 26, textures/shared_vega/base01#11, drawCounts: 465
=== time: 5, textures/shared_vega/base01#12, drawCounts: 18
=== time: 8, textures/shared_vega/base01#13, drawCounts: 117
=== time: 10, textures/shared_vega/base01#14, drawCounts: 102
=== time: 30, textures/shared_vega/base01#15, drawCounts: 531
=== time: 13, textures/shared_vega/base01#16, drawCounts: 237
=== time: 27, textures/shared_vega/base01#17, drawCounts: 726
=== time: 13, textures/shared_vega/base01#18, drawCounts: 309
=== time: 6, textures/shared_vega/base01#19, drawCounts: 66
=== time: 14, textures/shared_vega/base01#20, drawCounts: 237
=== time: 23, textures/shared_vega/base01#21, drawCounts: 435
=== time: 19, textures/shared_vega/base01#22, drawCounts: 84
=== time: 10, textures/shared_vega/base01#23, drawCounts: 96
=== time: 7, textures/shared_vega/base01#24, drawCounts: 63
=== time: 35, textures/shared_vega/base01#25, drawCounts: 723
=== time: 5, textures/shared_vega/base01#26, drawCounts: 6
=== time: 32, textures/shared_vega/base01#27, drawCounts: 612
=== time: 13, textures/shared_vega/base01#28, drawCounts: 240
=== time: 9, textures/shared_vega/base01#29, drawCounts: 99
=== time: 5, textures/shared_vega/base01#30, drawCounts: 18
=== time: 6, textures/shared_vega/base01#31, drawCounts: 24
=== time: 20, textures/shared_vega/base01#32, drawCounts: 363
=== time: 24, textures/shared_vega/base01#33, drawCounts: 429
=== time: 5, textures/shared_vega/base01#34, drawCounts: 12
=== time: 50, textures/shared_vega/base01#35, drawCounts: 129
=== time: 29, textures/shared_vega/base01#36, drawCounts: 675
=== time: 33, textures/shared_vega/base01#37, drawCounts: 582
=== time: 4, textures/shared_vega/base01#38, drawCounts: 12
=== time: 9, textures/shared_vega/base01#39, drawCounts: 117
=== time: 780, textures/shared_vega/base03a#0, drawCounts: 786
=== time: 859, textures/shared_vega/base03a#1, drawCounts: 2706
=== time: 724, textures/shared_vega/base03a#2, drawCounts: 1233
=== time: 732, textures/shared_vega/base03a#3, drawCounts: 519
=== time: 14, textures/shared_vega/base03a#4, drawCounts: 99
=== time: 12, textures/shared_vega/base03a#5, drawCounts: 153
=== time: 770, textures/shared_vega/base03a#6, drawCounts: 636
=== time: 700, textures/shared_vega/base03a#7, drawCounts: 615
=== time: 666, textures/shared_vega/base03a#8, drawCounts: 378
=== time: 19, textures/shared_vega/base03a#9, drawCounts: 192
=== time: 1259, textures/shared_vega/base01a#0, drawCounts: 2850
=== time: 661, textures/shared_vega/base01a#1, drawCounts: 2100
=== time: 1160, textures/shared_vega/base01a#2, drawCounts: 42
=== time: 21, textures/shared_vega/base01a#3, drawCounts: 192
=== time: 1380, textures/shared_vega/base01a#4, drawCounts: 192
=== time: 1108, textures/shared_vega/base01a#5, drawCounts: 516
=== time: 1150, textures/shared_vega/base01a#6, drawCounts: 870
=== time: 1032, textures/shared_vega/base01a#7, drawCounts: 762
=== time: 1089, textures/shared_vega/base01a#8, drawCounts: 1548
=== time: 1052, textures/shared_vega/base01a#9, drawCounts: 102
=== time: 1264, textures/shared_vega/trim03a#0, drawCounts: 390
=== time: 14, textures/shared_vega/trim03a#1, drawCounts: 90
=== time: 804, textures/shared_vega/trim03a#2, drawCounts: 246
=== time: 15, textures/shared_vega/trim03a#3, drawCounts: 12
=== time: 1260, textures/shared_vega/trim03a#4, drawCounts: 555
=== time: 1190, textures/shared_vega/trim03a#5, drawCounts: 24
=== time: 13, textures/shared_vega/trim03a#6, drawCounts: 72
=== time: 15, textures/shared_vega/trim03a#7, drawCounts: 234
=== time: 7, textures/shared_vega/trim03a#8, drawCounts: 66
=== time: 1149, textures/shared_vega/trim03a#9, drawCounts: 126
=== time: 1188, textures/shared_vega/trim03a#10, drawCounts: 36
=== time: 12, textures/shared_vega/trim03a#11, drawCounts: 18
=== time: 1213, textures/shared_vega/trim03a#12, drawCounts: 345
=== time: 11, textures/shared_vega/trim03a#13, drawCounts: 12
=== time: 6, textures/shared_vega/trim03a#14, drawCounts: 36
=== time: 6, textures/shared_vega/trim03a#15, drawCounts: 18
=== time: 9, textures/shared_vega/trim03a#16, drawCounts: 99
=== time: 11, textures/shared_vega/trim03a#17, drawCounts: 165
=== time: 1154, textures/shared_vega/trim03a#18, drawCounts: 270
=== time: 20, textures/shared_vega/trim03a#19, drawCounts: 174
=== time: 1160, textures/shared_vega/trim03a#20, drawCounts: 264
=== time: 1345, textures/shared_vega/base03#0, drawCounts: 642
=== time: 2532, textures/shared_vega/base03#1, drawCounts: 816
=== time: 1352, textures/shared_vega/base03#2, drawCounts: 12
=== time: 1306, textures/shared_vega/base03#3, drawCounts: 396
=== time: 1373, textures/shared_vega/trim02#0, drawCounts: 30
=== time: 1416, textures/shared_vega/panel01a#0, drawCounts: 12
=== time: 1395, textures/shared_vega/trim04#0, drawCounts: 6
=== time: 14, textures/shared_vega/trim04#1, drawCounts: 78
=== time: 1418, textures/shared_vega/trim04#2, drawCounts: 66
=== time: 17, textures/shared_vega/trim04#3, drawCounts: 108
=== time: 4, textures/shared_vega/trim04#4, drawCounts: 6
=== time: 7, textures/shared_vega/trim04#5, drawCounts: 54
=== time: 5, textures/shared_vega/trim04#6, drawCounts: 6
=== time: 3163, textures/shared_vega/trim04#7, drawCounts: 66
=== time: 3517, textures/shared_vega/trim04#8, drawCounts: 72
=== time: 15, textures/shared_vega/trim04#9, drawCounts: 21
=== time: 5, textures/shared_vega/trim04#10, drawCounts: 12
=== time: 7, textures/shared_vega/trim04#11, drawCounts: 78
=== time: 46, textures/shared_vega/base02#0, drawCounts: 72
=== time: 18, textures/shared_vega/base02#1, drawCounts: 138
=== time: 5, textures/shared_vega/base02#2, drawCounts: 12
=== time: 13, textures/shared_vega/base02#3, drawCounts: 54
=== time: 14, textures/shared_vega/base02#4, drawCounts: 84
=== time: 14, textures/shared_vega/base02#5, drawCounts: 54
=== time: 4, textures/shared_vega/base02#6, drawCounts: 12
=== time: 18, textures/shared_vega/trim02a#0, drawCounts: 18
=== time: 19, textures/shared_vega/panel06_300#0, drawCounts: 30
=== time: 7, textures/shared_vega/panel06_300#1, drawCounts: 36
=== time: 6, textures/shared_vega/panel06_300#2, drawCounts: 36
=== time: 30, textures/shared_vega/trim03#0, drawCounts: 240
=== time: 11, textures/shared_vega/trim03#1, drawCounts: 120
=== time: 28, textures/shared_vega/trim03#2, drawCounts: 108
=== time: 21, textures/shared_vega/trim03#3, drawCounts: 72
=== time: 33, textures/shared_vega/trim03b#0, drawCounts: 78
=== time: 48, textures/shared_vega/concrete01#0, drawCounts: 633
=== time: 10, textures/shared_vega/concrete01#1, drawCounts: 84
=== time: 15, textures/shared_vega/concrete01#2, drawCounts: 75
=== time: 14, textures/shared_vega/concrete01#3, drawCounts: 84
=== time: 23, textures/shared_vega/concrete01#4, drawCounts: 303
=== time: 8, textures/shared_vega/concrete01#5, drawCounts: 48
=== time: 15, textures/shared_vega/concrete01#6, drawCounts: 24
=== time: 12, textures/shared_vega/concrete01#7, drawCounts: 33
=== time: 8, textures/shared_vega/concrete01#8, drawCounts: 126
=== time: 18, textures/shared_vega/concrete01#9, drawCounts: 48
=== time: 24, textures/shared_vega/concrete01#10, drawCounts: 195
=== time: 19, textures/shared_vega/concrete01#11, drawCounts: 207
=== time: 9, textures/shared_vega/concrete01#12, drawCounts: 18
=== time: 19, textures/shared_vega/concrete01#13, drawCounts: 174
=== time: 34, textures/shared_vega/trim01a#0, drawCounts: 84
=== time: 19, textures/shared_vega/trim01a#1, drawCounts: 12
=== time: 11, textures/shared_vega/trim01a#2, drawCounts: 72
=== time: 117, models/mapobjects/vega/globe/globe#0, drawCounts: 1260
=== time: 2826, models/mapobjects/vega/container/container2#0, drawCounts: 276
=== time: 40, models/mapobjects/vega/container/container2#1, drawCounts: 276
=== time: 29, textures/shared_vega/floor02#0, drawCounts: 36
=== time: 55, textures/shared_vega/floor02#1, drawCounts: 36
=== time: 10, textures/shared_vega/floor02#2, drawCounts: 18
=== time: 15, textures/shared_vega/floor02#3, drawCounts: 36
=== time: 1812, textures/shared_vega/floor01#0, drawCounts: 51
=== time: 33, textures/shared_vega/floor01#1, drawCounts: 195
=== time: 9, textures/shared_vega/floor01#2, drawCounts: 27
=== time: 11, textures/shared_vega/floor01#3, drawCounts: 12
=== time: 11, textures/shared_vega/floor01#4, drawCounts: 27
=== time: 14, textures/shared_vega/floor01#5, drawCounts: 24
=== time: 11, textures/shared_vega/floor01#6, drawCounts: 12
=== time: 1575, textures/shared_vega/dirt01#0, drawCounts: 6
=== time: 23, textures/shared_vega/dirt01#1, drawCounts: 6
=== time: 1656, textures/shared_vega/light03_1500#0, drawCounts: 36
=== time: 1570, textures/shared_vega/light03_1500#1, drawCounts: 30
=== time: 1893, textures/shared_vega/panel05_300#0, drawCounts: 72
=== time: 26, textures/shared_vega/panel05_300#1, drawCounts: 18
=== time: 26, textures/shared_vega/panel02#0, drawCounts: 18
=== time: 17, textures/shared_vega/asphalt01#0, drawCounts: 36
=== time: 42, models/mapobjects/vega/monitor/display_door_red#0, drawCounts: 6
=== time: 10, models/mapobjects/vega/monitor/display_door_blue#0, drawCounts: 6
=== time: 1817, models/mapobjects/vega/monitor/monitor#0, drawCounts: 204
=== time: 41, models/mapobjects/vega/emitter/emitter#0, drawCounts: 240
=== time: 1946, textures/vega_custom/starchart02#0, drawCounts: 6
=== time: 62, textures/vega_custom/roadstrip#0, drawCounts: 12
=== time: 29, textures/vega_custom/wheeltracks#0, drawCounts: 72
=== time: 31, textures/vega_custom/vega_banner#0, drawCounts: 48
=== time: 30, textures/shared_vega/grate01#0, drawCounts: 90
=== time: 1548, textures/shared_vega/grate01#1, drawCounts: 60
=== time: 27, textures/shared_vega/grate01#2, drawCounts: 60
=== time: 64, textures/vega_custom/holo_projector#0, drawCounts: 48
=== time: 25, textures/shared_vega/glass01#0, drawCounts: 12
=== time: 2, textures/shared_vega/glass01#1, drawCounts: 6
=== time: 9, textures/shared_vega/glass01#2, drawCounts: 36
=== time: 8, textures/shared_vega/glass01#3, drawCounts: 18
=== time: 4, textures/shared_vega/glass01#4, drawCounts: 12
=== time: 5, textures/shared_vega/glass01#5, drawCounts: 6
=== time: 1, textures/shared_vega/glass01#6, drawCounts: 126
=== time: 24, textures/shared_vega/glass01#0, drawCounts: 12
=== time: 9, textures/shared_vega/glass01#1, drawCounts: 6
=== time: 15, textures/shared_vega/glass01#2, drawCounts: 36
=== time: 8, textures/shared_vega/glass01#3, drawCounts: 18
=== time: 62, textures/shared_vega/glass01#4, drawCounts: 12
=== time: 9, textures/shared_vega/glass01#5, drawCounts: 6
=== time: 1524, textures/shared_vega/glass01#6, drawCounts: 126
=== time: 73, textures/shared_vega/glass01#0, drawCounts: 12
=== time: 5, textures/shared_vega/glass01#1, drawCounts: 6
=== time: 15, textures/shared_vega/glass01#2, drawCounts: 36
=== time: 7, textures/shared_vega/glass01#3, drawCounts: 18
=== time: 10, textures/shared_vega/glass01#4, drawCounts: 12
=== time: 5, textures/shared_vega/glass01#5, drawCounts: 6
=== time: 20, textures/shared_vega/glass01#6, drawCounts: 126
=== layout numComponents: 3, componentType: 0x1406, normalize: 0, stride: 32
=== layout numComponents: 4, componentType: 0x140b, normalize: 0, stride: 32
=== layout numComponents: 4, componentType: 0x1401, normalize: 1, stride: 32
=== time: 51, textures/shared_vega/glass01#0, drawCounts: 6

And 0x1402 is GL_SHORT and 0x1401 is GL_UNSIGNED_BYTE and the R300 limitations are said to be:

Vertex formats GL_*INT and GL_DOUBLE are not supported. GL_*SHORT is supported only for 2- and 4-component vertex attributes, and GL_*BYTE only for 4-component attributes. All individual vertex attribute fetches must be DWORD-aligned. Quads and quadstrips cannot have the first vertex be the provoking vertex.

It looks like everything looks good.

Well I don't know why there are some 0x140B (GL_HALF_FLOAT) and 0x1406 (GL_FLOAT) but I'm not sure this is used, there is no Tess_DrawElements call immediately following them. And other things done in tr_vbo.cpp are setting GL_HALF with 4 components, which is said to be OK. It looks like the ones set in tr_vbo.cpp are what we seen being used in Tess_DrawElements as modifying them to GL_INT for example modifies what is set by Tess_DrawElements.

@illwieckz
Copy link
Member Author

illwieckz commented Jun 3, 2024

Well, maybe the thing is the usage of GL_HALF_FLOAT, in ioq3 renderergl2 tr_vbo.c, there is no GL_HALF_FLOAT set for TEXCOORD, but GL_FLOAT, if I do the same in our tr_vbo.cpp I obviously get wrong textures (I assume there is a mismatch with some half float functions used there and there), but, I get almost 48fps (even 50fps with r_smp 1), which is the performance I achieve with ioq3 renderergl2. Maybe the r300 driver lies about the support of ARB_half_float_vertex and emulates it in our back?

With GL_FLOAT and r_smp 0, 48fps:

unvanquished_2024-06-04_011926_000

With GL_FLOAT and r_smp 1, 50fps:

unvanquished_2024-06-04_012340_000

@illwieckz
Copy link
Member Author

With GL_FLOAT and r_smp 1 on Thunder, 48fps:

unvanquished_2024-06-04_012834_000

@illwieckz
Copy link
Member Author

The single change that recovers the performance (while breaking the texture rendering) is to do:

--- a/src/engine/renderer/tr_vbo.cpp
+++ b/src/engine/renderer/tr_vbo.cpp
@@ -220,7 +220,7 @@ static void R_SetAttributeLayoutsStatic( VBO_t *vbo )
        vbo->attribs[ ATTR_INDEX_QTANGENT ].frameOffset   = 0;
 
        vbo->attribs[ ATTR_INDEX_TEXCOORD ].numComponents = 4;
-       vbo->attribs[ ATTR_INDEX_TEXCOORD ].componentType = GL_HALF_FLOAT;
+       vbo->attribs[ ATTR_INDEX_TEXCOORD ].componentType = GL_FLOAT;
        vbo->attribs[ ATTR_INDEX_TEXCOORD ].normalize     = GL_FALSE;
        vbo->attribs[ ATTR_INDEX_TEXCOORD ].ofs           = offsetof( shaderVertex_t, texCoords );
        vbo->attribs[ ATTR_INDEX_TEXCOORD ].realStride    = sizeShaderVertex;

@illwieckz
Copy link
Member Author

illwieckz commented Jun 4, 2024

Two years ago I started a work-in-progress branch to support graphics cards not having support for ARB_half_float_vertex, like some Intel GL 2.1 cards. I haven't found time to touch it before, especially because my only sample for that Intel card is at some hundreds of km of here I live, so testing was a bit limited.

But preparatory work for it was already merged:

I revived this branch and worked on it a bit more to make it work (it is still in a very rough state), given I wanted to check if using that on an ATI r300 would improve the performances:

Et voilà! 🎉️ I now get 55fps on the default Vega spectator scene:

unvanquished_2024-06-04_042705_000

And 53fps on default Thunder spectator scene:

unvanquished_2024-06-04_043746_000

And I guess I fixed the support for those old GL 2 Intel cards in the process (I may be able to verify it with real hardware in the coming months). 😎️

I assume the r300 driver just emulates half-float vertexes, as it is said it is a hardware limitation to not support GL_HALF_FLOAT as component type for a vertex, while the driver accepts it. This is consistent with the u_vbuf_translate_buffers function name we can see in the call trace. 🙂️

@slipher
Copy link
Member

slipher commented Jun 4, 2024

Is there some necessary reason why the driver has to implement this in the dumbest possible way? I.e. re-translating the attributes on every frame instead of when the VBO is created or used for the first time?

Maybe we could implement attribute reformatting at the last minute like the driver does, instead of modifying all our structs. It appears we have all the struct layouts stored in a tabular format (vbo->attribs). This provides all the information we would need to create a new array on the fly with some component stored as a different type.

@VReaperV
Copy link
Contributor

VReaperV commented Jun 4, 2024

Is there some necessary reason why the driver has to implement this in the dumbest possible way? I.e. re-translating the attributes on every frame instead of when the VBO is created or used for the first time?

Because it doesn't know if the data in it has changed, probably.

Maybe we could implement attribute reformatting at the last minute like the driver does, instead of modifying all our structs. It appears we have all the struct layouts stored in a tabular format (vbo->attribs). This provides all the information we would need to create a new array on the fly with some component stored as a different type.

I don't really get this part, since it wouldn't solve the problem?

@slipher
Copy link
Member

slipher commented Jun 4, 2024

If we know that some data stays the same but the driver doesn't, then we would only have to translate it once, which should fix the performance.

@VReaperV
Copy link
Contributor

VReaperV commented Jun 4, 2024

Ah, yea, that would work, I misunderstood "on the fly" as "per frame".

@illwieckz
Copy link
Member Author

illwieckz commented Jun 4, 2024

Is there some necessary reason why the driver has to implement this in the dumbest possible way? I.e. re-translating the attributes on every frame instead of when the VBO is created or used for the first time?

I will probably report the issue on Mesa side at some point. Something looks weird indeed. Maybe they can do better and providing them our game as a testbed may help.

Maybe we could implement attribute reformatting at the last minute like the driver does, instead of modifying all our structs. It appears we have all the struct layouts stored in a tabular format (vbo->attribs). This provides all the information we would need to create a new array on the fly with some component stored as a different type.

This is currently a very naïve implementation and some conversions may be useless. Also it may indentifies that some data model are badly shaped. Before I touched anything there was already multiple places where some half-float were converted to float for processing then later re-converted to half-float. We may decide to keep everything of this as float before the latest storage step, which would not only avoid duplication of some code to support both float and half-float, but also would make faster the original half-float code.

@slipher
Copy link
Member

slipher commented Jun 7, 2024

Is there some necessary reason why the driver has to implement this in the dumbest possible way?

I investigated my question a bit and found one possibility... In case your R300 does not have the GL_ARB_buffer_storage extension, try the slipher/glbufferdata-init branch. (If it does have the extension, I guess you could still try it by disabling the cvar r_arb_buffer_storage.) With glBufferData we use the GL_STATIC_DRAW hint which is supposed to inform the driver that the contents will not be changed. But maybe due to the way the buffer is initialized, the driver sets some user-is-an-idiot flag and ignores the hint. The patch ensures to populate the buffer at the moment of initialization instead of shortly after.

@illwieckz
Copy link
Member Author

@slipher Running slipher/glbufferdata-init doesn't speed-up anything, even when disabling r_arb_buffer_storage.

@VReaperV
Copy link
Contributor

VReaperV commented Jun 9, 2024

Those hints are very unreliable, the driver is free to just completely ignore them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants