Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The LuaJIT profiler fails to collect samples under Linux. #1763

Open
ImagicTheCat opened this issue Jan 8, 2022 · 7 comments
Open

The LuaJIT profiler fails to collect samples under Linux. #1763

ImagicTheCat opened this issue Jan 8, 2022 · 7 comments
Labels
bug Something isn't working help wanted Extra attention is needed Linux

Comments

@ImagicTheCat
Copy link
Contributor

ImagicTheCat commented Jan 8, 2022

I suspect this is because of a bad interaction of multi/single-threaded signal handling between Audio.cpp:105 and lj_profile.c:191, causing the SIGPROF signal from not being received.

A workaround is to disable the audio module in conf.lua.

The same issue can probably happen when creating LÖVE threads.

@slime73 slime73 added bug Something isn't working help wanted Extra attention is needed labels Jan 13, 2022
@slime73
Copy link
Member

slime73 commented Jan 16, 2022

I don't have a Linux setup so someone else will have to investigate this.

@MikuAuahDark
Copy link
Contributor

Could you try again with the latest branch? I think b3b1317 fixes it, but not sure.

@slime73
Copy link
Member

slime73 commented Apr 18, 2022

That commit will only change things in situations where the code would early-exit (for example if the audio module doesn't fully initialize, or if Thread:start on a Lua thread returned false) so I'm not sure if it will affect this issue.

@pfirsich
Copy link
Contributor

pfirsich commented May 14, 2023

Sorry if I have misunderstood everything, but I tried to reproduce problem like this and I couldn't:

local profile = require("jit.profile")

function love.load()
    profile.start("l", function(thread, samples, vmstate)
        print(thread, samples, vmstate)
    end)
end

function love.update()
    local num = 0
    for i = 1, 100000 do
        num = num * 2
    end
end

and conf.lua:

function love.conf(t)
    --t.modules.audio = nil
    --t.modules.sound = nil
end

Even when both lines are commented out, i.e. audio and sound are enabled, samples is >= 1 (mostly 1, sometimes 2).
I am Pop!OS 22.04, Kernel 6.2.6 and löve c8e7d4e (current main as of writing this). I'd be interested to look into this issue. Would you mind sharing a small reproduction of your problem?

Btw. I added that loop in love.update because I thought that maybe I need to do some work in Lua to be a more realistic test, but it didn't help.

@pfirsich
Copy link
Contributor

This issue made me investigate the reason for the signal blocking and I could not really figure it out.
It was introduced in a4a62f3, but I think the real fix for os.execute (#1047) in that commit was the removal of the SIGCHLD signal handler. If I remove the signal blocking (pthread_setprocmask), os.execute still works just fine.
I also made a working fix for reaping child processes of openURL (#1928), that does not break os.execute, because that was the reason the signal handler was introduced in the first place in f7d226d and 30ff7d4.

Does @bartbes remember the reason for blocking the signals? If it's still necessary to block them, it's likely not necessary to block them all, right?

@ImagicTheCat
Copy link
Contributor Author

ImagicTheCat commented May 15, 2023

To clarify my first post, signal handling and multi-threading do not interact well.

See this discussion on stackoverflow.

Also this comment from LuaJIT:

/* Sadly, we have to use a static profiler state.
**
** The SIGPROF variant needs a static pointer to the global state, anyway.
** And it would be hard to extend for multiple threads. You can still use
** multiple VMs in multiple threads, but only profile one at a time.
*/

Signal handling seems to be mostly considered as a per-process concept and it is easy to mess with it in a multi-threading scenario. I cannot reproduce the issue currently, but it doesn't mean that it is necessarily gone (as a kind of undefined behavior).

Edit: as pointed out in this issue, two potential causes of the problem has been worked on since, thus I don't mind if the issue is closed; I would report again if encountered.

@slime73 slime73 added the Linux label Jun 18, 2023
@bartbes
Copy link
Member

bartbes commented Aug 5, 2023

Does @bartbes remember the reason for blocking the signals? If it's still necessary to block them, it's likely not necessary to block them all, right?

I only have a vague recollection of why I did it. As far as I remember, the problem is simply that the signal handler can run on any thread, and the only way to make sure it happened on the main thread, was disabling handling on any other threads. Unfortunately, since (at least) OpenAL spawns its own threads, the only way to get those to co-operate, was to make sure it automatically inherited the right settings.

Sure, you can probably mostly get away with allowing some signals, but that will only cause them to potentially be handled by random threads.

I'm not sure if the situation has improved? Maybe there is a way to set it just for child threads? Some quick googling reveals that pthread_setattr_default_np and pthread_attr_setsigmask_np might allow you to keep the signal handlers on the main thread active.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed Linux
Projects
None yet
Development

No branches or pull requests

5 participants