-
-
Notifications
You must be signed in to change notification settings - Fork 21.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevents OS::get_ticks_usec() from going backwards in time due to OS … #31863
Conversation
Note that you can put a WARN_PRINT to show when time is detected going backwards:
It actually occurs reasonably frequently on my Linux Mint, usually when switching between tasks rather than during gameplay. This may be affecting the CPU core that the timer is queried on, which has been cited as one of the causes of this kind of problem. See: This is causing problems further down the line, particularly where code is using unsigned math to compare between current and previous timing values, where a negative delta is interpreted as a huge positive difference. According to some articles, it may be possible to help alleviate this on some OSes by trying to reserve a particular core, however this may be unnecessary for Godot if it only occurs on things like task switching, which shouldn't affect gameplay. Note also that it is also possible to use a slightly more complex logic, whereby the godot clock 'resyncs' when it detects time going backwards, however hopefully that should not be necessary. |
These changes should be explained in their method descriptions in the docs. |
Ah good point I'll try and check this later. 👍 Shouldn't be anything major, get_ticks_usec() and get_ticks_msec() become non-const. Other than that the functionality is the same, is just a bug fix. Hopefully the internal get_ticks_raw_usec() and _ticks_usec_prev can be hidden from the docs (not sure how doctool works yet, will have to test). |
Do the OSes not have APIs to get monotonic time? |
Yes, however, afaik since things like multicore CPUs and speedstep, old assumptions about timing no longer hold, and these functions aren't guaranteed to give you accurate results. Just google for 'QueryPerformanceCounter + backwards' for more articles like those linked for more info. |
Just a note to hold on merging this while I do some testing on #31837, it may need the resync modification I mentioned earlier. |
@lawnjelly Interesting. Thanks for sharing the details behind this problem. There is quite a can of worms behind it. Reading some articles out there, the recommendations seem to be clamping like what is done in this PR, and then having the counter operate on a thread that has a processor affinity. To handle situations when the issue is related to multi-core hiccups. I still know very little about Godot's backend, but if part of the issue is the result of multi-threading on certain hardware/OSes, shouldn't this also include making the tick query execute in a thread that has been assigned to a single processor? Does the engine's main thread have a processor affinity? |
…bugs. On some hardware / OSes calls to timing APIs may occasionally falsely give the impression time is running backwards (e.g. QueryPerformanceCounter). This can cause bugs in any Godot code that assumes time always goes forwards. This PR simply records the previous time returned by the OS, and returns the previous time if the new time is earlier than the previous time. May fix godotengine#31837, godotengine#25166, godotengine#27887.
I don't think you should allow a diff of 0 since the previous call as users expect time to pass between calls and might try to divide by this difference resulting in very hard to reproduce bugs. |
Good point, maybe we can have a minimum value. Do you reckon 1 usec would be okay, considering it might be converted to float .. and possible precision errors? |
I'd read this too, but this is well outside my small realms of knowledge. This may well be something worth experimenting with if we don't do it already (bearing in mind there may be performance consequences of reserving a core if Godot is running in the background). However it is unclear to me whether this may be an issue on some hardware but not others, depending how it does timing. This thread for instance suggests that the linux kernel now updates some values in userspace regularly that are cheap to read, which I don't know how that ties in with the idea of swapping cores affecting the value read: I think we need a more hardware familiar guy to help us make decisions on that front. In my mind though the first step is to make sure we have as robust a timer as possible, that can cope with violations of assumptions (such as time always going forward), and that is something we can do. |
Am now suspecting this is an issue, certainly with the current fix here, and I will close this PR and come up with an alternative way of doing this. Currently get_ticks_usec() may be called from multiple threads from Godot (don't know that it is, but it could be), which may lead to time shifts each time if these are on different cores. The main bug scenario concerns the call from main::iteration which drives the main delta. So rejigging the synchronization via a wrapper in other threads could mess with this. To start with I will try and solve the main::iteration backwards timing issue, probably by making a central single call to the OS API timing function once per frame, from the main thread, and ensuring this is synchronized as best we can. It is also probably best if in as many circumstances as possible throughout Godot and plugins we don't call the OS function, we instead use our locally stored, per frame timer. Thinking about it, this is what I normally see in games.. typically you want access to 2 times:
In Godot however a file in files for get_ticks_usec shows it being used from approx 30 different files. Maybe we need to change our timing paradigm slightly to properly deal with this problem, and use indirect queries to our own stored timing values. Other uses for timers are for profiling, in which case the vagueness of the current situation might be okay. I am not sure about the other cases, and how crucial it is on a case by case basis to deal with these issues. |
I can only imagine, as the other issues about delta suggest, that the processing loops would be impacted by this. So without a reliable delta across all systems, there would be systems where games would be unplayable. Everything could potentially break if a player's system bumps into these conditions... Sounds like it could be a very critical problem. 😟 |
Yes absolutely but it should be relatively easy for us to fix. The main problem existing now is the timing paradigm used within Godot, time is measured via the OS API in multiple places throughout godot, in scripts, and presumably in plugins etc. This should be changed imo. 😐 In most cases what is needed is not 'the time at which that particular instruction runs', but a global frame time, which already exists in Engine::_frame_ticks (but is not really used). We need an explicit split between 2 functions:
If we were feeling cheeky we could simply change the get_ticks_usec function to instead return the frame time, and change this in the docs, then have a separate function to query the current time. But there is the possibility this might be compatibility breaking in some cases. Or perhaps more likely we could have Engine::_frame_ticks accessible from script etc and give it a more sexy name, and encourage users to use this rather than the old function (perhaps emphasize this in the docs, or make the function deprecated and add a one off warning message). It's one of those things that is easy to change in the godot source code but more difficult to change habits in users (people writing games and modules etc). |
To anyone that can reproduce the original issue The patch just tries to avoid some unnecessary math, limiting potential overflows and hopefully minimizing numerical errors: (there is also a 3.1 version of this patch: https://github.com/Faless/godot/tree/spike/clock_info_3.1 ) |
I will try but it will have to be next week. Although I don't think the effects were due to numerical error converting to usecs. I gave up the investigation in the end, but to summarise:
I suspect that the earlier issues that reported backward timing were due to use of CLOCK_MONOTONIC rather than CLOCK_MONOTONIC_RAW which was subject to NTP adjustment, which could cause this kind of issue. But I think that was fixed in #22424. The negative timings I was getting could be cured by putting a mutex around the clock_gettime call. Just found this which is indicative of at least some of the issues that used to exist (whether they have been cured or not now I don't know). Although maybe not the problem on my kabylake CPU as it has constant_tsc, tsc_adjust, rdtscp, tsc_known_freq, nonstop_tsc. https://linux.die.net/man/3/clock_gettime
|
…bugs.
On some hardware / OSes calls to timing APIs may occasionally falsely give the impression time is running backwards (e.g. QueryPerformanceCounter). This can cause bugs in any Godot code that assumes time always goes forwards. This PR simply records the previous time returned by the OS, and returns the previous time if the new time is earlier than the previous time.
May fix #31837, #25166, #26887. Mentioned in #31016.
It does this by changing the original OS function names from get_ticks_usec() to get_ticks_raw_usec() (and making them protected to emphasise not to call them) and making the get_ticks_usec() function into a wrapper which enforces the logic.
This way none of the other Godot code needs to be changed, and it can assume that deltas will always be positive, rather than having each bit of code deal with this possibility.