More fps limiter tweaks #1728

gendlin · 2024-06-06T10:07:15Z

This branch modifies the frame limiter, improving latency, CPU usage, and accuracy at the cost of occasional higher frame time spikes (due to yielding the current thread). Given that someone using the in-game limiter is probably interested in the lowest possible latency, I think this may be a worthwhile compromise. In combination with #1733 the changes in this branch provide great latency results for me. And c50582c fixes the flaky, unpredictable performance that I've been seeing in general.

rfomin · 2024-06-06T10:44:37Z

src/i_video.c

+        }
+        else
+        {
+            I_Sleep(0); // yield


Do we need I_Sleep(0)? On Windows, SDL will call Sleep(0) which makes some sense, but I don't think it's portable.

Yes, on Windows it reduces CPU usage for me and gives a nice latency improvement (which directly yielding the thread via e.g. SwitchToThread does not provide). Do you know if there is a posix counterpart to Sleep(0)?

At least this should be no worse than the existing implementation, which has many calls to Sleep(0) (when remaining time < 2ms and > 1ms).

I think there is a pthread_yield, but I'd rather avoid that complication. How about I_SleepUS(100) instead of I_Sleep(0) or something similar?

It's too coarse, in my tests I_SleepUS(100) often sleeps for 1 ms whereas I_Sleep(0) sleeps only 10-30us or so.

Okay, maybe it makes sense to add the I_ThreadYield function, with POSIX it's sched_yield, on Windows it's just Sleep(0). How do you test this?

At the moment I am only able to test on windows so I was hoping someone else would try linux and let me know if I broke anything. The only new parts are the lack of pure busy-wait at the end, and the change to I_SleepUS (both of which improved latency / reduced CPU usage for me, at the cost of slightly higher frametime variance).

try linux and let me know if I broke anything

I only ran a quick test with mangohud --dlsym woof but the frametimes were identical between the master branch and this PR. OpenGL render driver, 9900K, RTX3080, Arch.

src/i_video.c

This option (included in /O1 and /O2 by default) leads to difficult to diagnose performance issues on older machines, and makes the linker and optimizer behavior very unpredictable while providing no performance benefits in return. This change increases binary size by 2-3% but otherwise has no negative impact.

This is consistent with stuff like RTSS and helps the fps counter to more often match the limiter setting at high fps.

gendlin · 2024-06-08T10:58:01Z

cmake/WoofSettings.cmake

+    # Disable function-level linking
+    _checked_add_compile_option(/Gy-)
+endif()
+


Adding this compiler flag robustly fixes the latency gremlins I've been seeing (see discussion in #1712)

Do you mind if I push a commit to your branch so that you can test the results? Namely reverting this "fix" to test the theory here.

I should have mentioned that I did test that earlier and it does fix the problem, no #1712 required. Also fixes the joy_enable weirdness.

Interesting, thanks for going through that exercise. I'll let the others comment on the actual fix you're proposing.

Adding this compiler flag robustly fixes the latency gremlins I've been seeing (see discussion in #1712)

Is this for a specific version of MSVC?

I don't find it that hard to believe. It drastically alters the layout of the binary, different optimizer and linker behavior, etc. -ffunction-sections is apparently the gcc equivalent

Maybe, but I don't see any difference in performance on my machines. I would rather use -O2 than touch those options.

From this answer, I don't see why we should disable it: https://stackoverflow.com/questions/1834597/what-is-the-comdat-section-used-for

I found it in ZDoom, commit is from 2008 without much comments: ZDoom/gzdoom@fb50df2#diff-1e7de1ae2d059d21e1dd75d5812d5a34b0222cef273b7c3a2af62eb747f9d20aR17

In that case you should add '-ffunction-sections -Wl,--gc-sections' to gcc/clang builds. Those flags are disabled by default. This change is just moving MSVC behavior into line with the other compilers.

I think MSVC/GCC developers have their own reasons for setting these defaults. I have a feeling that your latency issues are an SDL/Windows problem, maybe the Woof changes affect this for some reason, but the true problem is not here.

The MSVC defaults are designed for large C++ projects. The COMDAT settings make more sense in that context than they do for us here.

src/i_video.c

ceski-1 · 2024-06-09T10:59:22Z

Is this from CapFrameX or something similar? I see two graphs here. The first is raw frametimes which look about the same, and the second is a quantile function of that data, showing a small improvement in the 99.9% frametimes where microstutters can be annoying (sometimes called 0.1% frametimes). But the change is so small it looks like typical run-to-run measurement error.

Would you mind comparing this PR (merge in the latest master changes) against a baseline without your recent improvements? Meaning without this PR and without the two changes here, and using the framerate limiter at some desired value. That should show a nice improvement with your system, since it will reflect the complete set of improvements so far. It would also be nice to see msvc profiling results.

gendlin · 2024-06-10T04:12:39Z

@ceski-1 I can check, but I wouldn't expect those other two changes to impact frame times so I would probably just be reproducing the graph above (the input latency issues are unrelated to framerate).

looks like typical run-to-run measurement error

I mainly added the CapFrameX graph to show there weren't significant regressions in frame pacing with these changes - if there were no real changes in frame times I would be happy because I'm mainly hunting latency here.

Speaking of latency, with these changes playing Woof is now what I would describe as telepathic. I am pretty latency sensitive and I would say Woof (in large part thanks to raw_input 👍) achieves parity with a good hardware-accelerated Vulkan renderer on my machine, and it's significantly better than any other software renderers I've tried (although Odamex gets close).

So I'm very happy with this now and I promise not to spam any more anti-latency PRs (unless there are regressions🙂). I appreciate you guys' patience and willingness to humor me on this stuff, despite not being able to repro locally. I mainly write these PRs for my own benefit, but hopefully someone else out there with an old machine will see improvements as well.

rfomin · 2024-06-10T04:36:11Z

The only way to reduce input latency on the Woof side is to reduce frame time. And we are only talking about player view rotation, all other inputs are only registered 35 times per second, this is a Doom engine limitation that we have to keep for gameplay/demo compatibility.

But you say that your changes don't affect frame time even on your machine, so how is it possible that they affect input latency? I think the problem is on the SDL/Windows side.

gendlin · 2024-06-10T10:16:17Z

@rfomin The input latency is about getting the latest mouse events from the Windows message pump, it has nothing to do with frame rates - I can detect it just as easily at 60 fps as I can at 500. That's why the Sleep(0) helps here, it allows other threads to run on the CPU, one of which is presumably tasked with filling in the data structures that store input events for the Woof process to consume later on.

To test this hypothesis, I tried reverting #1505. That PR increases the priority of the main Woof thread, which I would expect to further starve input processing of the cycles it needs to do its job, and that's exactly what it appears to do - reverting #1505 lowers latency even further for me, on top of the changes in this branch. I think this is one of those unintended side effects that leads people to recommend not messing with thread priority (or at least not increasing it)🙂.

rfomin · 2024-06-10T10:29:57Z

The input latency is about getting the latest mouse events from the Windows message pump, it has nothing to do with frame rates - I can detect it just as easily at 60 fps as I can at 500.

My point is that there is nothing we can do about the Windows message pump. All we can do is call SDL_PumpEvents and register inputs more often, and for that we need to reduce the frame time. I think you're trying to solve input problems in the wrong place.

I think this is one of those unintended side effects that leads people to recommend not messing with thread priority (or at least not increasing it)🙂.

I agree that we shouldn't mess with thread priority without a verifiable result. That change did nothing for me either.

gendlin · 2024-06-10T10:44:06Z

My point is that there is nothing we can do about the Windows message pump.

We can allow it the resources to do its job by not thrashing the CPU.

All we can do is call SDL_PumpEvents and register inputs more often

No, that won't help. I think you missed my point that SDL is getting its events downstream from an asynchronous kernel process, which needs CPU resources independent of the Woof thread to fully do its job.

I think you're trying to solve input problems in the wrong place.

Do you suggest we rewrite the Windows kernel? 😋

That change did nothing for me either.

Do you have a hyperthreaded CPU? That might make you immune to this problem since multiple threads can run concurrently on the same core. You maybe could try turning this off and testing, although I think it requires messing in BIOS so you may not want to bother.

rfomin · 2024-06-10T10:57:36Z

All we can do is call SDL_PumpEvents and register inputs more often

No, that won't help. I think you missed my point that SDL is getting its events downstream from an asynchronous kernel process, which needs CPU resources independent of the Woof thread to fully do its job.

I think you're trying to solve input problems in the wrong place.

Do you suggest we rewrite the Windows kernel? 😋

My suggestion is to try rewriting the SDL. Do you really know how SDL receives events? If you improve this, it will help a lot of projects.

gendlin · 2024-06-10T11:15:14Z

Do you really know how SDL receives events?

Not really (and I don't want to🙂) but in the debugger I don't see any SDL event processing thread, only one for timers. So I'm pretty sure SDL_PumpEvents happens synchronously in the same thread as the caller. In that case it can't be responsible for this problem - it can only process all the events it has available to it at the time of the call. Even if it did have an asynchronous thread running in the background, we would be back to the problem of priority starvation, etc.

rfomin · 2024-06-10T11:28:37Z

So I'm pretty sure SDL_PumpEvents happens synchronously in the same thread as the caller. In that case it can't be responsible for this problem - it can only process all the events it has available to it at the time of the call.

I think that is correct. So we can't do anything about it, right?

We can allow it the resources to do its job by not thrashing the CPU.

But how can changing compiler flags or NOINLINE fix it? Maybe we can improve it by changing the I_Sleep calls, but anything else shouldn't affect it.

gendlin · 2024-06-10T11:45:40Z

So we can't do anything about it, right?

Not by modifying SDL, no. This is a kernel level thing.

But how can changing compiler flags or NOINLINE fix it?

Fewer cache misses, more opportunities for the OS to context switch into event processing and make more input data available later on for the SDL_PumpEvents call, would be my guess. I don't know the microarchitectural details but you'll have to take my word for it that it actually makes a difference, because I've seen it with my own eyes. It's very reproducible on my machine.

fabiangreffrath · 2024-06-10T17:29:08Z

reverting #1505 lowers latency even further for me, on top of the changes in this branch.

Does reverting this commit alone achieve anything for you? Because, honestly, I'm starting to get confused which combination of changes is best and which is even better and which is bestest, you know. 😉

gendlin · 2024-06-11T05:08:22Z

Does reverting this commit alone achieve anything for you?

Good question🙂. I tested this and the answer is a resounding "no". It's dependent on the changes in this branch.

I'm starting to get confused which combination of changes is best and which is even better and which is bestest, you know. 😉

I think the general rule of thumb here is that more changes = better. No one improvement supersedes another, and using the whole suite of changes provides the best result. The only exception is #1712, which is no longer strictly necessary with the addition of the /Gy- flag.

gendlin · 2024-06-11T06:06:38Z

After discussion with @rfomin clarified the issue in my mind, I'm starting to think that maybe the material difference here is not old CPU vs. new CPU, but CPUs with hyperthreading vs. those without. Has anyone else happened to test on a CPU without hyperthreading (or with it disabled in BIOS)?

rfomin · 2024-06-11T06:30:01Z

Has anyone else happened to test on a CPU without hyperthreading (or with it disabled in BIOS)?

My old 2012 laptop doesn't have hyperthreading, there is no difference.

Sorry, but I just don't believe that the /Gy- flag or "cache misses" can have any effect on the input latency in this case. Perhaps we can improve the frame limiter somehow, for example by changing the I_Sleep function or calling the limiter after input processing instead of before. I'm sure compiler flags or cache optimisation have nothing to do with it.

ceski-1 · 2024-06-11T06:30:21Z

After discussion with @rfomin clarified the issue in my mind, I'm starting to think that maybe the material difference here is not old CPU vs. new CPU, but CPUs with hyperthreading vs. those without. Has anyone else happened to test on a CPU without hyperthreading (or with it disabled in BIOS)?

Like Fabian, I am a little bit lost at this point. I don't mind running some tests later with HT on/off, but what build(s) should be tested and what is being measured?

gendlin · 2024-06-13T08:59:44Z

But what if a future change to the code in, say, half a year leads to just another binary layout? Wouldn't the micro-optimizations that we apply to the code for the specific layout as of today become counterproductive then?

The changes here help to prevent that scenario. Putting the frame limiter in its own (out-of-line) function makes the alignment independent of changes upstream in I_FinishUpdate, and /Gy- makes layout and optimization much more predictable going forward. Both of these fix weird little gremlins that made performance brittle during my testing (#1712, the joy_enable thing, sensitivity to unrelated I_FinishUpdate changes, etc).

gendlin · 2024-06-13T09:10:44Z

My rationale against NOINLINE #1728 (comment)

I don't like the /Gy- flag because there is very little official documentation on it. I am against any optimisation without a measurable difference.

Right, those are the arguments from conservatism which I referred in my comment. That's fine, but in the face of such conservatism, nothing interesting can be done here.

Could you remove /Gy- and NOINLINE from this PR?

I believe I've answered that. However, you guys control the repository so I suppose I'm at your mercy there.

rfomin · 2024-06-13T09:17:29Z

Right, those are the arguments from conservatism which I referred in my comment. That's fine, but in the face of such conservatism, nothing interesting can be done here.

For any optimisation work we need to measure results. Where is the conservatism here?

Could you remove /Gy- and NOINLINE from this PR?

I believe I've answered that. However, you guys control the repository so I suppose I'm at your mercy there.

So will you do it?

gendlin · 2024-06-13T10:45:05Z

Reached my limit (heh) on this one. Thanks everyone for the testing and review

ceski-1 · 2024-06-13T15:30:21Z

This PR was closed but I still wanted data, so I ran some benchmarks with CapFrameX. Here's a summary of the results:

	Desktop (i9-9900K)	Laptop (i5-2520M)
CPU usage and power	Lower than master	Same as master
Frame pacing	Worse than master	Same as master
Input lag approximation	Same as master	Same as master
Toggling fpslimit_busywait	No change	No change
Reverting "raised priority" commit	No change	Worse frame pacing

Conclusion:

The results vary greatly depending on CPU architecture.
For the two systems tested, frame pacing is either the same or worse, and CPU usage is either the same or lower.
The current framerate limiter in the master branch could be improved, but the results indicate that more research is needed.
The "raised priority" commit is beneficial for older computers.

Relevant graphs only:

Graphs

No change in framerate for an older laptop:

Worse frame pacing for a desktop:

High CPU usage with framerate limiter from master branch:

Lower CPU usage with framerate limiter from this PR:

gendlin · 2024-06-14T06:41:21Z

Re-opening this after cooling off a bit, and not wanting to close the door on fruitful investigation.

I ran some benchmarks with CapFrameX. Here's a summary of the results:

Thanks, this is interesting. It looks like the latency of I_SleepUS is architecture dependent, and not really more accurate than I_Sleep in some cases. It may be best to just I_Sleep(1) when remaining time > 2ms and busy-wait the rest, when not in CPU-preserving I_Sleep(0) mode.

gendlin · 2024-06-14T06:47:13Z

Could you remove /Gy- and NOINLINE from this PR?

I believe I've answered that. However, you guys control the repository so I suppose I'm at your mercy there.

So will you do it?

No, why would I? It provides real benefits to me, and no real detriments to anyone else. By the way, if you don't like premature optimization, you should like these changes because they both remove optimizations. 🙂

rfomin · 2024-06-14T06:54:46Z

By the way, if you don't like premature optimization, you should like these changes because they both remove optimizations. 🙂

These optimisations are made by compilers without our interversion and can be changed by compiler developers. If we add these changes to our code, we will increase complexity for no benefit.

It looks like the latency of I_SleepUS is architecture dependent, and not really more accurate than I_Sleep in some cases. It may be best to just I_Sleep(1) when remaining time > 2ms and busy-wait the rest, when not in CPU-preserving I_Sleep(0) mode.

Yes, I_SleepUS is not precise. CREATE_WAITABLE_TIMER_HIGH_RESOLUTION were added in Windows 10, version 1803, see implementation in SDL3: https://github.com/libsdl-org/SDL/blob/51902d4ac53ca7ef46baff38e75deec98321667d/src/timer/windows/SDL_systimer.c#L85 We have to support at least Windows 7.

fabiangreffrath · 2024-06-14T06:57:43Z

Does anyone know about the MSVC /OPT and /ORDER parameters that are mentioned in the /Gy switch documentation?

gendlin · 2024-06-14T07:12:54Z

Does anyone know about the MSVC /OPT and /ORDER parameters that are mentioned in the /Gy switch documentation?

I've never seen anyone use /ORDER, and it's not enabled by default. /OPT:REF and /OPT:ICF are enabled by default, and act on COMDATs (the function-specific binary sections created mainly by /Gy, which allows linker more leeway to do what it likes with function layout). Since /Gy- reduces the number of COMDATs, it reduces the impact of those two optimizations, which remove redundant / unreferenced COMDATs. This is why /Gy- slightly increases the size of the binary (by 2% or so).

gcc doesn't do the /Gy or /OPT equivalents unless you specifically ask it to, via -ffunction-sections and -Wl,--gc-sections, respectively.

gendlin · 2024-06-16T10:17:27Z

@ceski-1 Do you use 'prefer maximum performance' power management mode in your nvidia settings? I've noticed that Woof makes so little demand on the GPU that otherwise it can sometimes downclock itself, which can add 0.5+ms to render submit according to Special K.

I added results from my Nehalem machine below. I actually prefer playing in the second scenario (normal priority) due to the reduced latency. High priority is somewhat subjectively smoother, but not as much as I would expect given the numbers here. The exception is that with youtube playing (or even just sitting paused), normal priority has noticeably degraded performance.

This fixes the last little bit of latency flakiness I was seeeing.

fabiangreffrath · 2024-06-16T17:34:59Z

src/i_video.c

@@ -731,10 +731,16 @@ static void I_ResetTargetRefresh(void);
 #define I_CpuPause()
 #endif

+#ifdef _MSC_VER
+#pragma optimize("s", on) // Don't unroll the wait loop


Could this probably be the single most important change in this PR?

I had the same question but no, this doesn't supersede the other changes in this PR. Testing the various permutations of the three de-optimizations, I still need all three.

ceski-1 · 2024-06-16T18:53:31Z

@ceski-1 Do you use 'prefer maximum performance' power management mode in your nvidia settings?

Yes. For testing I start with the default nvcpl settings, then I change that, Vsync Off, and Low Latency Mode On. The latter doesn't matter though since Doom is never GPU limited with this system, so the queue is skipped.

As an alternative to CapFrameX, which is slightly outdated, you should try a newer version of the underlying PresentMon tool (2.0+) directly since it has a "Click To Photon Latency" metric in both the overlay and console logging app. The name is misleading as it doesn't include physical hardware latency but supposedly it's a good measurement of total PC latency. So if there's an improvement with your changes, you'd see it in that number.

Edit: The name is really misleading because you only need to move the mouse to get measurements, clicking is useless since it's tied to Doom's ticrate for demo compatibility, just like player movement, weapon switching, etc.

ceski-1 · 2024-06-16T20:56:17Z

Here's a web page from 2002 that describes a framerate limiter implementation. Scroll to the bottom. Look familiar? It's the limiter from the master branch*, and then this PR adds in Sleep(0), making it essentially the same thing.

*Maybe also influenced by this and this.

gendlin · 2024-06-16T22:25:34Z

@ceski-1 Good find. There is also some good exploration here. His robustSleep function is essentially what we are doing here. It may be good to allow a couple microseconds tolerance, which I can add.

SDL calls timeBeginPeriod(1) for us. You can actually set it lower via an undocumented ntdll function (to 0.5 ms), but 1 ms seems to work ok. I'm not sure why your frame times are slightly worse on this branch. Does the game play noticeably worse for you? I'd prefer to keep the current I_SleepUS limiter loop if possible, as it plays much better on my machine (versus using I_Sleep(1) and/or only waiting when time remaining > 2 ms).

ceski-1 · 2024-06-17T06:49:32Z

There is also some good exploration here. His robustSleep function is essentially what we are doing here. It may be good to allow a couple microseconds tolerance, which I can add.

The high resolution timer solution is interesting too.

SDL calls timeBeginPeriod(1) for us. You can actually set it lower via an undocumented ntdll function (to 0.5 ms), but 1 ms seems to work ok. I'm not sure why your frame times are slightly worse on this branch.

I'm not sure either. See #1744 (comment).

plays much better on my machine

Can you check if there is a measureable latency improvement on your system using PresentMon? Overlay: woof_bench_raw.json, Copy it to %USERPROFILE%\Documents\PresentMon\Loadouts. It looks like this:

overlay

You can also measure input-to-present latency like this: ceski-1@939f66f. Use the showfps cheat and then move the mouse. It looks like this:

latency

gendlin · 2024-06-18T16:20:52Z

Vsync Off

Forcing vsync off in the driver may not be totally benign - doing this increases frame times by 0.2-0.3 ms for me, strangely enough (on two different driver revisions, including the most recent). I also noticed that turning off "Enable G-SYNC, G-SYNC compatible" under "Set up G-SYNC" results in a small but noticeable latency improvement for me when using the direct3d9 backend.

The high resolution timer solution is interesting too.

This one plays worse for me, unfortunately.

Can you check if there is a measureable latency improvement on your system using PresentMon?

No differences. My click-to-photon averages around 2.6 ms with maxes in the 4-5 range, regardless of what I do. Even turning render batching on/off (which is a huge difference for me) has no apparent impact on the PresentMon numbers.

You can also measure input-to-present latency like this:

Thanks, it's cool to see these numbers in real time. I tend to hover around 1 ms at 2x render scale w/ 500 fps cap. It's obviously better to be below 1 ms for this "race" from input to present, but none of the changes I tested have a substantive impact on this number (as you would expect).

ceski-1 · 2024-06-18T18:01:05Z

@gendlin Okay, I have one more request. Please bear with me, the baselines have all shifted and I need to see the data to form an opinion.

Test these commits:

"before fps-limiter-tweaks" gendlin@e4ec2ab
"after fps-limiter-tweaks" gendlin@f421192
"current master" https://github.com/fabiangreffrath/woof/commits/master/

For (2), pick your preferred cpu_priority value and make sure it's noted. Then provide your subjective impressions for input latency and frame pacing for each of the three commits.

Additionally, please show me these CapFrameX results, configured as 60-second runs:

Bar chart comparing the three commits (default layout with average fps, 1% fps, and 0.2% fps).
Frametime graph for each commit (every "additional graphs" option checked, manually set y-axis to 0-10ms).
Input lag approximation, just the numbers are fine (lower bound, expected, upper bound).

gendlin · 2024-06-18T21:34:53Z

@ceski-1 Perceived latency, from best to worst:

after tweaks (f421192) [cpu_priority=0]
after tweaks [cpu_priority=1]
master (80e7fa3) with pause re-added
master
before tweaks (e4ec2ab)

master with the "normal priority" change added to it has the same latency as it does without.

I recorded while playing through e1m1 (render batching disabled in all cases). I didn't include data for the cpu_priority=0 test because CapFrameX causes it to stutter too much. Otherwise, it feels about as smooth as the other tests, with better latency.

There are no significant differences between the CapFrameX reported latency numbers.

Individual graphs

ceski-1 · 2024-06-18T21:58:18Z

For each test case, what does the sensor data (the tab just below the frametimes graph) say for "average CPU max thread load(%)"?

gendlin · 2024-06-18T22:03:57Z

They're all roughly the same: (61, 63, 57) for (master, before, after) respectively.

gendlin · 2024-06-18T23:49:59Z

Also, master with the pause instruction restored has a latency in between master and "after tweaks [cpu_priority=1]". I'm not sure why that was removed.

ceski-1 · 2024-06-19T06:19:23Z

Not much insight from the results, unfortunately. There's slightly lower CPU max thread load for "after tweaks". It's small but consistent across all three systems (~5%). You also perceive input latency to be lower with "after tweaks" but all attempts to measure it have failed.

After reviewing SDL's input code, I have a theory that the changes in this PR seem to help due to SDL's poor raw input handling under certain conditions (8kHz mice with modern systems, 1kHz mice on older ones). SDL3 fixes this and the improvements are significant. Unfortunately, there are no plans to backport the fix to SDL2. So we wait or hope for a contributor to backport it.

My point is that there are SDL performance improvements on the horizon, so I'm not sure where to go with this PR. Attempt some profiling? Check the results of this change? I'm out of suggestions.

gendlin marked this pull request as draft June 6, 2024 10:07

gendlin force-pushed the fps-limiter-tweaks branch from c63afbb to 6edfba0 Compare June 6, 2024 10:10

More fps limiter tweaks (experimental)

f32c8dd

gendlin force-pushed the fps-limiter-tweaks branch from 6edfba0 to f32c8dd Compare June 6, 2024 10:39

rfomin reviewed Jun 6, 2024

View reviewed changes

ceski-1 reviewed Jun 7, 2024

View reviewed changes

src/i_video.c Outdated Show resolved Hide resolved

gendlin added 5 commits June 7, 2024 12:09

Mark fps limiter as noinline

f598174

Cleanup

c0855c3

Round fps counter to nearest fps

db51bae

This is consistent with stuff like RTSS and helps the fps counter to more often match the limiter setting at high fps.

Remove unneeded u64 divide

2aa96a1

gendlin commented Jun 8, 2024

View reviewed changes

gendlin changed the title ~~More fps limiter tweaks (experimental)~~ More fps limiter tweaks Jun 8, 2024

gendlin marked this pull request as ready for review June 8, 2024 11:30

rfomin reviewed Jun 9, 2024

View reviewed changes

src/i_video.c Show resolved Hide resolved

gendlin closed this Jun 13, 2024

gendlin reopened this Jun 14, 2024

gendlin added 2 commits June 16, 2024 05:08

Revert previous and provide 'cpu_priority' instead

9d2f5b0

Add note based on Nehalem testing

0e1e6fa

gendlin added 2 commits June 16, 2024 13:03

Don't unroll the wait loop

e13d9d4

This fixes the last little bit of latency flakiness I was seeeing.

Fix clang build

f421192

fabiangreffrath reviewed Jun 16, 2024

View reviewed changes

ceski-1 mentioned this pull request Jun 16, 2024

FPS limiter tweaks #1744

Merged

More fps limiter tweaks #1728

Are you sure you want to change the base?

More fps limiter tweaks #1728

Conversation

gendlin commented Jun 6, 2024 • edited Loading

rfomin Jun 6, 2024 • edited Loading

Choose a reason for hiding this comment

gendlin Jun 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rfomin Jun 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gendlin Jun 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rfomin Jun 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ceski-1 commented Jun 9, 2024

gendlin commented Jun 10, 2024

rfomin commented Jun 10, 2024

gendlin commented Jun 10, 2024

rfomin commented Jun 10, 2024

gendlin commented Jun 10, 2024

rfomin commented Jun 10, 2024

gendlin commented Jun 10, 2024

rfomin commented Jun 10, 2024

gendlin commented Jun 10, 2024

fabiangreffrath commented Jun 10, 2024

gendlin commented Jun 11, 2024 • edited Loading

gendlin commented Jun 11, 2024

rfomin commented Jun 11, 2024

ceski-1 commented Jun 11, 2024

gendlin commented Jun 13, 2024 • edited Loading

gendlin commented Jun 13, 2024

rfomin commented Jun 13, 2024

gendlin commented Jun 13, 2024

ceski-1 commented Jun 13, 2024

gendlin commented Jun 14, 2024

gendlin commented Jun 14, 2024

rfomin commented Jun 14, 2024

fabiangreffrath commented Jun 14, 2024

gendlin commented Jun 14, 2024 • edited Loading

gendlin commented Jun 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ceski-1 commented Jun 16, 2024 • edited Loading

ceski-1 commented Jun 16, 2024

gendlin commented Jun 16, 2024

ceski-1 commented Jun 17, 2024

gendlin commented Jun 18, 2024

ceski-1 commented Jun 18, 2024

gendlin commented Jun 18, 2024 • edited Loading

ceski-1 commented Jun 18, 2024

gendlin commented Jun 18, 2024

gendlin commented Jun 18, 2024

ceski-1 commented Jun 19, 2024

gendlin commented Jun 6, 2024 •

edited

Loading

rfomin Jun 6, 2024 •

edited

Loading

gendlin Jun 6, 2024 •

edited

Loading

rfomin Jun 6, 2024 •

edited

Loading

gendlin Jun 8, 2024 •

edited

Loading

rfomin Jun 8, 2024 •

edited

Loading

gendlin commented Jun 11, 2024 •

edited

Loading

gendlin commented Jun 13, 2024 •

edited

Loading

gendlin commented Jun 14, 2024 •

edited

Loading

gendlin commented Jun 16, 2024 •

edited

Loading

ceski-1 commented Jun 16, 2024 •

edited

Loading

gendlin commented Jun 18, 2024 •

edited

Loading