Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pulse_cnt now prevents user PCNT code from accounting for pending overflow interrupt. (IDFGH-8726) #10167

Closed
wiegleyj opened this issue Nov 12, 2022 · 5 comments
Assignees
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally

Comments

@wiegleyj
Copy link

pcnt_unit_t use to be a typedef enum (in other words, the "handle" was a simple number.)

Thus, you could do:
if (PCNT.int_st.val & BIT(unit)) {

so that you could correctly account for overflows and make a counter that could handle more than 16-bits.

Now the handle has been changed to pcnt_unit_handle_t which is typedef struct pcnt_unit_t * that is no longer a simple int or a pointer to an int. It's a pointer to a struct. The big problem is that the definition of struct pcnt_unit_t is hidden in pulse_cnt.c and thus not available to user code.

There is no way to get at pcnt_unit_handle_t->unit_id to be able to detect pending overflow interrupts and you cannot make a reliable counter that can handle greater than 16-bits as a result.

Am I missing some better/correct method to account for pending overflows while in a spinlock correctly? I think there needs to be an accessor function to get at elements of struct pcnt_unit_t or it needs to be moved into a .h header file so that its fields are accessible. (An accessor to the spinlock and unit_id fields would seem to be sufficient. (spinlock for protecting access per pcnt_unit instead of having to make another, and unit_id so you can get into PCNT.int_st and such correctly.)

@github-actions github-actions bot changed the title pulse_cnt now prevents user PCNT code from accounting for pending overflow interrupt. pulse_cnt now prevents user PCNT code from accounting for pending overflow interrupt. (IDFGH-8726) Nov 12, 2022
@espressif-bot espressif-bot added the Status: Opened Issue is new label Nov 12, 2022
@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels Nov 14, 2022
@suda-morris
Copy link
Collaborator

@wiegleyj To compensate the counter overflow, I think you can add a watchpoint and register a callback. e.g. https://github.com/espressif/esp-idf/blob/master/tools/unit-test-app/components/test_utils/ref_clock_impl_rmt_pcnt.c#L67-L72

if (PCNT.int_st.val & BIT(unit)) {

Sorry, we don't recommend users read/write registers from the application layer in this way (one reason is that, the register name/layout can be changed in other ESP chips). Please check if the callback functions can meet your requirement.

Now the handle has been changed to pcnt_unit_handle_t which is typedef struct pcnt_unit_t * that is no longer a simple int or a pointer to an int. It's a pointer to a struct.

That's right. One of the goals of the new PCNT driver is to hide the implementation details. The user doesn't have to know the underlying PCNT channel number. The driver will manage the resource pool and allocate the channel for you.

@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed and removed Status: In Progress Work is in progress labels Nov 14, 2022
@wiegleyj
Copy link
Author

Nope. Your idea won't work. Your comment is addressing the basic problem of "how do I attach an interrupt to be notified when limits are reached". It does not do it in a way that prevents errors from occurring due to shared data and the fact that the hardware continues to count and overflow while you are adding together multiple pieces of accounting data.

There is a race condition that you have to be able to take care of.

I totally get the idea of hiding the implementation details and that concept is great but you are going to have to provide an atomic way to access two pieces of information, 1) the current count register value, and 2) whether an overflow is pending but hasn't been serviced by an interrupt.

To make software counters that can handle more than 16-bits you need an "on reach" interrupt to increment/decrement a user-space accumulator variable but this interrupt and the user code for getting current total count needs to be synchronized...

struct encoder_counter {
	int counter;
	portMUX_TYPE lock;
};

// this interrupt service routine assumes the ONLY watch points registered are the LOW and HIGH overflow LIMITS.
static bool pcnt_on_reach(pcnt_unit_handle_t unit, const pcnt_watch_event_data_t *edata, void *user_ctx) {
	struct encoder counter*encoder = (struct encoder counter*) user_ctx;
	taskENTER_CRITICAL_ISR(&(encoder_counter->lock)); 
	encoder_counter->counter += edata->watch_point_value; // adjust userland accumulator
	taskEXIT_CRITICAL_ISR(&(encoder_counter->lock));
	return false;
}

int encoder_getCount(struct encoder_counter *encoder) {
	int result;
	taskENTER_CRITICAL(&(encoder_counter->lock)); // computation of "accumulator PLUS register" needs to be atomic.
	pcnt_unit_get_count(encoder_counter->pcnt_unit, &result); // get the current count in the register (hardware is still spinning it, could overflow just before this line or right after it.)

        // right here you have a race condition. You're in a critical section. The interrupt may be fired off but is being prevented
        // from completing because you can't allow the user accumulator count to change while you are reading the
        // pcnt_unit_get_counter (or vice versa); the two should be atomically tied together.
        // BUT the pcnt_unit_get_counter is still being changed by hardware non-atomically. You need to know
        // if you got a value AND if an overflowed occurred. because if you got back 0 from the register it could really mean
        // the "count" there is HIGH_LIMIT and hasn't been serviced by an interrupt routine. So... you want something like....

        if (PCNT.int_st.val & (0x1<<encoder_counter->pcnt_unit->unit_id)) { // check if overflow is pending
		pcnt_unit_get_count(encoder_counter->pcnt_unit, &result); // reread the count, the overflow might have occurred after the first read and the check of the interrupt status. Which invalidates the original read.
		if (PCNT.status_unit[encoder_counter->pcnt_unit->unit_id].cnt_thr_h_lim_lat_un) // figure out the reason for the overflow
			result += PCNT_HIGH_LIMIT;        // compensate with what the interrupt is *going* to do once allowed
		else if (PCNT.status_unit[encoder_counter->pcnt_unit->unit_id].cnt_thr_l_lim_lat_un)
			result += PCNT_LOW_LIMIT;
	}
	result += encoder_counter->counter;
	taskEXIT_CRITICAL(&(encoder->lock));
	return result;
}

And even that low level system has a problem if the limits are small enough and the count is advancing fast enough because this function could get blocked by a higher priority task in the middle and the counter could spin over several times and you (and the interrupt) would have no way of knowing how many times it overflowed while waiting to be chosen for running again.

To Hide this implementation from the user you will need to provide an atomic function such as:
esp_err_t pcnt_unit_get_count(pcnt_unit_handle_t pcnt_unit_handle_t, unit, int *value, const pcnt_watch_event_data_t *pending)

This, in one inseparable shot gives you the ability to determine what the total value of the count really is at the moment the data was retrieved. You still need the critical section which protects the tied nature of the userland counter and the register value. But now you have the ability to account for the tied nature of userland counter, register value, AND if there is an overflow waiting to be accounted for.

I think even that may have problems because there the overflow status register and the pulse count register are two separate registers, and you need to retrieve both atomically as well. But at the low level I think that can be achieved in a fashion similar to above. "get-test-get"

You have either hidden too much of the implementation or you have not provided a method for exposing sufficient information. But as it stands in 5.0 there is no way, that I can see, to make a reliable accumulating counter (more than 16-bits) that isn't subject to errors equal to the high or low limit caused by overflow during value acquisition. You can reduce your error magnitude by reducing the limits but that increases interrupt inefficiency.

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Reviewing Issue is being reviewed labels Nov 15, 2022
@suda-morris
Copy link
Collaborator

suda-morris commented Nov 15, 2022

@wiegleyj Thank you for this comprehensive explanation! Now I see the issue is between the overflow happening and the watch-point interrupt getting serviced, if the user reads the value by pcnt_unit_get_count, we haven't compensated that loss.

So if we can move the above strategy to the pcnt_unit_get_count

esp_err_t pcnt_unit_get_count(pcnt_unit_handle_t unit, int *value)
{
    int result = 0;
    pcnt_group_t *group = NULL;
    ESP_RETURN_ON_FALSE_ISR(unit && value, ESP_ERR_INVALID_ARG, TAG, "invalid argument");
    group = unit->group;
    pcnt_hal_context_t *hal = &group->hal;
    int unit_id = unit->unit_id;
    uint32_t intr_status = pcnt_ll_get_intr_status(hal->dev);
    // overflow happened, but interrupt is not serviced yet
    if (intr_status & PCNT_LL_UNIT_WATCH_EVENT(unit_id)) {
        uint32_t event_status = pcnt_ll_get_event_status(hal->dev, unit_id);
        // compensate the overflow loss
        if (event_status & (1 << PCNT_LL_WATCH_EVENT_LOW_LIMIT)) {
            result = unit->low_limit;
        } else if (event_status & (1 << PCNT_LL_WATCH_EVENT_HIGH_LIMIT)) {
            result = unit->high_limit;
        }
    }
    result += pcnt_ll_get_count(hal->dev, unit_id);
    *value = result;
    return ESP_OK;
}

will that be OK for you?

@wiegleyj
Copy link
Author

Close... You clearly understand the issue now, but I think there is still a race condition since this solution doesn't lock out the interrupt from being serviced while it runs. So...

esp_err_t pcnt_unit_get_count(pcnt_unit_handle_t unit, int *value)
{
    int result = 0;
    pcnt_group_t *group = NULL;
    ESP_RETURN_ON_FALSE_ISR(unit && value, ESP_ERR_INVALID_ARG, TAG, "invalid argument");
    group = unit->group;
    pcnt_hal_context_t *hal = &group->hal;
    int unit_id = unit->unit_id;
    uint32_t intr_status = pcnt_ll_get_intr_status(hal->dev);
    // overflow happened, but interrupt is not serviced yet

// For instance here... Now we have captured the interrupt status from time T_0.
// If the interrupt happens to occur now, or in the next few lines, it would cause
// the user's ISR to run which would increment/decrement their counter
// This would effectively change their counter as though it got serviced.

// Then in the next lines this function would ALSO add that overflow amount
// to the result we send back. effectively doubling the overflow compensated
// for by at the user code level.


    if (intr_status & PCNT_LL_UNIT_WATCH_EVENT(unit_id)) {
        uint32_t event_status = pcnt_ll_get_event_status(hal->dev, unit_id);
        // compensate the overflow loss
        if (event_status & (1 << PCNT_LL_WATCH_EVENT_LOW_LIMIT)) {
            result = unit->low_limit;
        } else if (event_status & (1 << PCNT_LL_WATCH_EVENT_HIGH_LIMIT)) {
            result = unit->high_limit;
        }
    }
    result += pcnt_ll_get_count(hal->dev, unit_id);
    *value = result;
    return ESP_OK;
}

I don't fully understand the pcnt__ll libraries but I would suggest something like...

esp_err_t pcnt_unit_get_count(pcnt_unit_handle_t unit, int *value)
{
    int result = 0;
    pcnt_group_t *group = NULL;
    ESP_RETURN_ON_FALSE_ISR(unit && value, ESP_ERR_INVALID_ARG, TAG, "invalid argument");
    group = unit->group;
    pcnt_hal_context_t *hal = &group->hal;
    int unit_id = unit->unit_id;

GET_A_CRITICAL_LOCK_HERE_THAT_PREVENTS pcnt_default_isr(void *args) from calling on_reach()

result = pcnt_ll_get_count(hal->dev, unit_id);   // get the current counter amount

// the status could be clear or it could change now. we can't stop the hardware from doing its thing.

    uint32_t intr_status = pcnt_ll_get_intr_status(hal->dev);
    // overflow happened, but interrupt is not serviced yet
    if (intr_status & PCNT_LL_UNIT_WATCH_EVENT(unit_id)) {

// yep, the hardware did its thing and changed the overflow (BUT the user code callback didn't run)
result = pcnt_ll_get_count(hal->dev, unit_id);   // get the lastest counter amount that the hardware updated because our first fetch may have been from before the overflow ocurred.

// add the comensation amount
        uint32_t event_status = pcnt_ll_get_event_status(hal->dev, unit_id);
        // compensate the overflow loss
        if (event_status & (1 << PCNT_LL_WATCH_EVENT_LOW_LIMIT)) {
            result += unit->low_limit;   // NOTICE WE ARE += now, not just =
        } else if (event_status & (1 << PCNT_LL_WATCH_EVENT_HIGH_LIMIT)) {
            result += unit->high_limit; // NOTICE WE ARE += now, not just =
        }
    }
RELEASE_CRITICAL_LOCK_THAT_PREVENTS  pcnt_default_isr(void *args) from calling on_reach()

    *value = result;
    return ESP_OK;
}

This will also require changes to pcnt_default_isr. I think something like...

IRAM_ATTR static void pcnt_default_isr(void *args)
{
    bool need_yield = false;
    pcnt_unit_t *unit = (pcnt_unit_t *)args;
    int unit_id = unit->unit_id;
    pcnt_group_t *group = unit->group;
    pcnt_watch_cb_t on_reach = unit->on_reach;

    uint32_t intr_status = pcnt_ll_get_intr_status(group->hal.dev);
    if (intr_status & PCNT_LL_UNIT_WATCH_EVENT(unit_id)) {

GET_CRITICAL_LOCK_HERE (because you are about to clear the status that the `pcnt_get_count` uses.

        pcnt_ll_clear_intr_status(group->hal.dev, PCNT_LL_UNIT_WATCH_EVENT(unit_id));
        uint32_t event_status = pcnt_ll_get_event_status(group->hal.dev, unit_id);
        // iter on each event_id
        while (event_status) {
            int event_id = __builtin_ffs(event_status) - 1;
            event_status &= (event_status - 1); // clear the right most bit

            // invoked user registered callback
            if (on_reach) {
                pcnt_watch_event_data_t edata = {
                    .watch_point_value = unit->watchers[event_id].watch_point_value,
                    .zero_cross_mode = pcnt_ll_get_zero_cross_mode(group->hal.dev, unit_id),
                };
                if (on_reach(unit, &edata, unit->user_data)) {
                    // check if we need to yield for high priority task
                    need_yield = true;
                }
            }
        }

RELEASE_CRITICAL_LOCK_HERE

    }
    if (need_yield) {
        portYIELD_FROM_ISR();
    }
}

It might be possible to increase the efficiency of the lock if you can lock on the individual unit's PCNT spinlock and not on the group lock. I don't fully understand the dynamics of the abstraction library and the low level registers. pcnt_get_count only messes with one unit and thus should use the unit's individual spinlock I think. But I'm less clear about the pcnt_default_isr which might not support locking an individual spinlock without messing up the other units or allowing another unit to mess the current one up. I haven't analyzed it enough. Basically... globally locking all the units/ISR will certainly work. Locking just the individual unit's get_count and on_reach would be more efficient.

@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: In Progress Work is in progress Status: Reviewing Issue is being reviewed Resolution: NA Issue resolution is unavailable labels Nov 20, 2022
espressif-bot pushed a commit that referenced this issue Dec 4, 2022
@wiegleyj
Copy link
Author

Ah... I see you did all the accumulation within the library and effectively extended the counter to be a signed 32-bit for the user without them having to do anything other than assign a true value to a struct member when creating the counter.

That works for me.

Thanks for the taking the time, especially as it was up against all the work up against the very immediate release of 5.0. Very responsive. Thanks!

suda-morris added a commit to suda-morris/idf-extra-components that referenced this issue Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

3 participants