Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta-IRQs making cooperative threads preemptive #20255

Closed
wopu-ot opened this issue Oct 31, 2019 · 6 comments · Fixed by #20665
Closed

Meta-IRQs making cooperative threads preemptive #20255

wopu-ot opened this issue Oct 31, 2019 · 6 comments · Fixed by #20665
Assignees
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Milestone

Comments

@wopu-ot
Copy link
Collaborator

wopu-ot commented Oct 31, 2019

Describe the bug
Meta-IRQs can make cooperative threads preempt each other. The documentation on meta-IRQs mentions that "this breaks the promise made to cooperative threads" but it is unexpected that this implies not only that meta-IRQs can preempt cooperative threads, but also that cooperative threads can preempt each other.

To Reproduce
Consider 3 threads with decreasing priorities { M, A, B }, where M is a meta-IRQ and A and B are cooperative threads. Furthermore, consider that A is waiting on a semaphore and B is executing when M is activated and preempts B. M gives the semaphore to A and exits. This causes A to be scheduled as soon as M exits, effectively preempting B. A and B are no longer cooperative with regard to each other. See metairq.c for an example to trigger the issue.

Expected behavior
Cooperative threads do not preempt each other.

Impact
Meta-IRQs cannot be used in cases where they would make sense, e.g., to implement Mayflies in the Bluetooth stack. This can be worked around, but the remaining use cases for meta-IRQs are quite limited.

@wopu-ot wopu-ot added the bug The issue is a bug, or the PR is fixing a bug label Oct 31, 2019
@andrewboie
Copy link
Contributor

Is the expected behavior here that if B is running, IRQ M fires and releases a resource that A was sleeping on, that context returns back to B instead of A?

@wopu-ot
Copy link
Collaborator Author

wopu-ot commented Oct 31, 2019

Yes, returning to B (as a regular IRQ would do as described here) would be the expected behavior.

@andrewboie andrewboie added the priority: medium Medium impact/importance bug label Oct 31, 2019
@aescolar aescolar added this to the v2.1.0 milestone Nov 12, 2019
@aescolar aescolar added priority: high High impact/importance bug and removed priority: medium Medium impact/importance bug labels Nov 12, 2019
@aescolar
Copy link
Member

Raising to high following @Vudentz advice

@wopu-ot
Copy link
Collaborator Author

wopu-ot commented Nov 12, 2019

I believe that the issue could be solved in a rather simple way for the single-processor case. Upon switching from a cooperative thread to a meta-IRQ, a variable could remember which thread was preempted. That thread would then be the one to switch to once a non-meta-IRQ is to be scheduled.

For the SMP case, it is not clear to me what the extected behavior for the preempted cooperative thread is. Should it remain on the same CPU? Would it be allowed to migrate?

@andyross
Copy link
Contributor

Indeed, a code fix isn't particularly hard. We'd need a per-CPU tracking of the "last preempted cooperative thread", and on exit from a metairq stack just make that current (if it's still runnable, of course).

The SMP case of migration is an interesting edge case: I'd say no, we don't want to make it schedulable, because the reason it was cooperative in the first place was probably for locking and we don't want it to race with other threads.

And there are some other edge cases: these threads can be suspended, reach a timeout or be killed while they are preempted and we need to track that.

FWIW: my original hope was that we could just fix this with documentation. Cooperative threading was really only intended for locking, it doesn't work for interrupt bottom half functionality (where it should be preemptible by higher priority threads -- really that's why MetaIRQ exists in the first place) and it doesn't provide locking at all in SMP (where knowing you won't be preempted says nothing about what other CPUs will run). But @aescolar points out we have subsystems relying on cooperative behavior that will break if used with a metairq, so I guess a code fix is needed.

@andyross
Copy link
Contributor

My attempt at a fix is posted above, sorry for crossing with the other work. It passes the test in @wopu-ot 's PR, but it does not include it (can you resubmit that as a separate patch please?). Pushing for early review, it hasn't passed a full sanitycheck for me yet.

andyross pushed a commit to andyross/zephyr that referenced this issue Nov 14, 2019
When a MetaIRQ preempts a cooperative thread, that thread would be
added back to the generic run queue.  When the MetaIRQ is done, the
highest priority thread will be selected to run, which may obviously
be a cooperative thread of a higher priority than the one that was
preempted.

But that's wrong, because the original thread was promised that it
would NOT be preempted until it reached a scheduling point on its own
(that's the whole point of a cooperative thread, of course).

We need to track the thread that got preempted (one per CPU) and
return to it instead of whatever else the scheduler might have found.

Fixes zephyrproject-rtos#20255

Signed-off-by: Andy Ross <[email protected]>
carlescufi pushed a commit that referenced this issue Nov 15, 2019
When a MetaIRQ preempts a cooperative thread, that thread would be
added back to the generic run queue.  When the MetaIRQ is done, the
highest priority thread will be selected to run, which may obviously
be a cooperative thread of a higher priority than the one that was
preempted.

But that's wrong, because the original thread was promised that it
would NOT be preempted until it reached a scheduling point on its own
(that's the whole point of a cooperative thread, of course).

We need to track the thread that got preempted (one per CPU) and
return to it instead of whatever else the scheduler might have found.

Fixes #20255

Signed-off-by: Andy Ross <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants