Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: interrupts: add documentation section for zero-latency IRQs #21765

Merged
merged 1 commit into from
Jan 9, 2020

Conversation

ioannisg
Copy link
Member

@ioannisg ioannisg commented Jan 8, 2020

Add a simple documentation section for the Zero-Latency
IRQs feature supported by the kernel.

Signed-off-by: Ioannis Glaropoulos [email protected]

Closes #21185

@ioannisg ioannisg added this to the v2.2.0 milestone Jan 8, 2020
@ioannisg ioannisg requested a review from cvinayak January 8, 2020 14:29
@ioannisg ioannisg added the RFC Request For Comments: want input from the community label Jan 8, 2020
As zero-latency interrupts may preempt the execution of kernel critical
section operations they shall not be allowed to use any kernel
functionality that may modify kernel structures, or generate exceptions
that need to be handler asynchronously (e.g. kernel panic).
Copy link
Collaborator

@pabigot pabigot Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that taking a spinlock on a uniprocessor involves blocking interrupts, and that ZLIs can bypass that, therefore they run while locks are being held. What I'd like to see, which relates to #21648, is clear description of what you can do in a ZLI, referencing terms from #21678. "shall not...modify kernel structures" is imprecise.

Clearly a ZLI must be at least avoid invoking anything that isn't isr-ok.

I would expect a ZLI should be able to queue material to be processed by a thread, e.g. invoke k_sem_give() or k_queue_put(). So it has to be able to invoke things that are reschedule, and so can modify some kernel structures, right?

However, what if there's a meta-IRQ thread that gets made ready by an operation in a ZLI? At the next reschedule point the meta-IRQ thread will be selected, regardless of scheduler lock and current thread type (e.g. cooperative non-meta-IRQ).

Is there any chance that this could enable a context switch while the spinlock is still held? Not at the point the ZLI completes, unless that's a reschedule point (is it? nothing specifies) but later, within the locked code that was interrupted when that code issues k_sem_give() which would not normally cause a context switch because it was a non-meta-IRQ cooperative thread?

BTW: Does k_is_in_isr() correctly identify that one is in a ZLI? (As opposed to thread mode; clearly it couldn't distinguish ZLI from non-ZLI ISR.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that taking a spinlock on a uniprocessor involves blocking interrupts, and that ZLIs can bypass that, therefore they run while locks are being held. What I'd like to see, which relates to #21648, is clear description of what you can do in a ZLI, referencing terms from #21678. "shall not...modify kernel structures" is imprecise.

Clearly a ZLI must be at least avoid invoking anything that isn't isr-ok.

Yes, but since ZLI is an ISR, this assumption is satisfied already.

I would expect a ZLI should be able to queue material to be processed by a thread, e.g. invoke k_sem_give() or k_queue_put(). So it has to be able to invoke things that are reschedule, and so can modify some kernel structures, right?

This is not necessary, if we accept that ZLI will be a sub-arch feature. Currently it is used in the Nordic LE Controller and it does not touch the kernel.

However, what if there's a meta-IRQ thread that gets made ready by an operation in a ZLI? At the next reschedule point the meta-IRQ thread will be selected, regardless of scheduler lock and current thread type (e.g. cooperative non-meta-IRQ).

This is not applicable we don't allow ZLIs to touch kernel.

Is there any chance that this could enable a context switch while the spinlock is still held? Not at the point the ZLI completes, unless that's a reschedule point (is it? nothing specifies) but later, within the locked code that was interrupted when that code issues k_sem_give() which would not normally cause a context switch because it was a non-meta-IRQ cooperative thread?

ZLIs are wrapped around the standard ISR wrapper so there is a reschedule check at the end, anyways. But this should not be an issue if ZLI code does nothing to generate conditions for reschedule, right?

BTW? Does k_is_in_isr() correctly identify that one is in a ZLI? (As opposed to thread mode; clearly it couldn't distinguish ZLI from non-ZLI ISR.)

Correct, it does not distinguish between the two types of ISRs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect a ZLI should be able to queue material to be processed by a thread, e.g. invoke k_sem_give() or k_queue_put(). So it has to be able to invoke things that are reschedule, and so can modify some kernel structures, right?

This is not necessary, if we accept that ZLI will be a sub-arch feature. Currently it is used in the Nordic LE Controller and it does not touch the kernel.

OK. As long as nobody ever wants to use a ZLI to capture and quickly clear an event that requires a thread be notified that the event happened, great.

BTW? Does k_is_in_isr() correctly identify that one is in a ZLI? (As opposed to thread mode; clearly it couldn't distinguish ZLI from non-ZLI ISR.)

Correct, it does not distinguish between the two types of ISRs.

I should have been more clear: does k_is_in_isr() return true when invoked from a ZLI? I wasn't aware that ZLIs used the standard ISR wrapper, so I guess the answer is probably "yes".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind a somewhat stronger statement here either. A ZLI will defeat any synchronization the kernel tries to do. Effectively the ZLI code lives "outside" the kernel and may not do anything that modifies or even inspects state that the kernel expects to be synchronized. We'll never be able to document all that rigorously, so IMHO the docs should make the "outside" requirement clear. Something like:

"Zero latency interrupts are expected to be used to manage hardware events directly, and not to interoperate with the kernel code at all. They should treat all kernel APIs as undefined behavior (i.e. an app that uses them in a ZLI is responsible for directly verifying correct behavior), and they should not modify any data inspected by kernel APIs invoked from normal Zephyr contexts."

Copy link
Member Author

@ioannisg ioannisg Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @andyross . Well, I don't see any harm with inspecting kernel structures, as long as no modification is done, but I admit the value of a stronger statement here. I'll update it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't break the kernel by reading its data, but you can still make wrong decisions. Consider inspecting things like "Does this semaphore have any waiting threads?" -- you may have interrupted the kernel in the middle of a sempahore operation and the answer may depend on which state you inspect.

Even things like "What time is it?" can break. The tick count is 64 bit, and you may have interrupted the timer ISR while it had written only half of a 32-bit-rolled-over value. (FWIW: exactly this issue of non-atomic time updates vs. interrupts was the source of one of the first really hard embedded bugs I ever figured out, on a 8051 system about 21 years ago, heh).

Basically: we should promise that nothing works. And if something can be made to work, we should clarify that it's the application author's job to do all the validation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with @andyross on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, as agreed, based on the input from @andyross .
I only kept the original phrase about not allowed synchronous exceptions, to stress that a bit more.

@ioannisg ioannisg removed the RFC Request For Comments: want input from the community label Jan 8, 2020
Add a simple documentation section for the Zero-Latency
IRQs feature supported by the kernel.

Signed-off-by: Ioannis Glaropoulos <[email protected]>
@nashif nashif merged commit c393f3f into zephyrproject-rtos:master Jan 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

zero-latency IRQ behavior is not documented?
6 participants