Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New encoder logic with debouncing #26723

Merged

Conversation

dbuezas
Copy link
Contributor

@dbuezas dbuezas commented Jan 23, 2024

Description

This PR is a follow up to #26501.
After multiple tests using interrupts, it became obvious that the main problems I observe with the encoder are related to contact bouncing.
Since other reported in #26501 that the old step skipping workarounds were needed in their machines to avoid unresponsiveness when changing direction, I conclude that the contact bouncing issue is present in many boards.

Luckily, update_buttons is called extremely frequently (I measure 180kHz in my SKR3) so a very simple debouncing scheme can be used without the need for interrupts.

This PR:

  • Adds debouncing logic for the encoder: encoder inputs need to be stable for 2ms before their value is used.
  • Disables clicks if too close to a scroll (100ms): These are almost always accidental
  • Removes half step timeout logic for half clicks and other workarounds: These result in the "clicky" haptic of the encoder getting "out of synch" with UI scrolls, and the symptoms it treated are now solved via debouncing.

Requirements

Any board with a rotary encoder.

Benefits

Rock solid encoder readings. No false readings and no missed steps. The encoder finally feels right.

Configurations

Related Issues

#26501
#26605

@dbuezas
Copy link
Contributor Author

dbuezas commented Jan 23, 2024

Before

Very jumpy, often goes backwards.

WhatsApp.Video.2024-01-23.at.22.52.57.mp4

After

Direction and responsiveness are fixed.

WhatsApp.Video.2024-01-23.at.22.58.01.mp4

@dbuezas
Copy link
Contributor Author

dbuezas commented Jan 23, 2024

Note: I avoided using interrupts or any type of counting to cover slower boards with less interrupt pins.

The changes increase RAM usage by 14 bytes as static vars:

  • 4 bytes store the last time the encoder moved (to inhibit clicks)
  • 4 bytes for the time of the last change of EN_A
  • 4 bytes same for EN_B
  • 1 byte for the last undebounced (raw) encoder pins' state
  • 1 byte for the current debounced encoder state

@thisiskeithb thisiskeithb added Needs: Testing Testing is needed for this change C: LCD & Controllers labels Jan 24, 2024
@thisiskeithb
Copy link
Member

I just put a call out in our #testing channel on Discord to collect feedback on this PR.

@The-EG
Copy link
Contributor

The-EG commented Jan 25, 2024

Testing this with DWIN_MARLINUI_LANDSCAPE on a genuine DWIN (not that it should matter, an encoder is an encoder). No issues. For me the effect is more subtle, but it is more consistent for sure.

@dbuezas
Copy link
Contributor Author

dbuezas commented Jan 25, 2024

Thanks for the test @The-EG,
do you have a 32 or 16 bit board? Any issues when reversing direction (i.e the ui not responding to the first step)

@The-EG
Copy link
Contributor

The-EG commented Jan 25, 2024

do you have a 32 or 16 bit board? Any issues when reversing direction (i.e the ui not responding to the first step)

32bit. I didn't have pronounced issues in the first place, so I think the improvement is less noticeable for me, but the biggest thing I noticed is when making a large change (ie. spinning the encoder quickly), the reaction isn't as jumpy and feels more in line with the motion. I can't say I noticed any issues with reversing, either before or after.

I'll see if I can try this on my CR10_STOCKDISPLAY later, that one had more issues than this DWIN (menu selection going too far/not far enough, etc.), even on the exact same board. I suspect the improvement may be more noticeable there.

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Jan 26, 2024

Sadly the same problem as with #26501 for me: The encoder loses steps when changing direction. When changing the scrolling direction, you need to go two encoder detents for the menu to actually move the first item up/down with this PR applied. Doesn't always happen, but once the bug is triggered, it seems to stay until the printer is powered down. Setup: BTT E3 Mini SKR v1.2 with CR10_STOCKDISPLAY, bugfix-2.1.x @ 22fc07d.

Unrelated: Am I reading the code wrong, or should BLOCK_CLICK_AFTER_MOVEMENT_MS rather be named BLOCK_MOVEMENT_AFTER_CLICK_MS? 🤔

@dbuezas
Copy link
Contributor Author

dbuezas commented Jan 26, 2024

That's a bummer, but it means that it is not just a lost step. There must be something else going on.

@XDA-Bam I have a couple questions you could help me with:

  • Does the defect happen in both directions (changing from left to right and also from right to left)?
  • Excluding this problem, does the encoder work more consistently otherwise in your printer?
  • Could you upload your configuration files here?

Maybe I can reproduce this phenomenon in my machine by matching your config.

Regarding BLOCK_CLICK_AFTER_MOVEMENT_MS, it prevents clicks which are too close to a movement (not the other way around)

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Jan 26, 2024

That's a bummer, but it means that it is not just a lost step. There must be something else going on.

@XDA-Bam I have a couple questions you could help me with:

* Does the defect happen in both directions (changing from left to right and also from right to left)?

Yes, both.

* Excluding this problem, does the encoder work more consistently otherwise in your printer?

I think so, yes. More consistent overall.

* Could you upload your configuration files here?

Attached below.
Marlin.zip

Maybe I can reproduce this phenomenon in my machine by matching your config.

Regarding BLOCK_CLICK_AFTER_MOVEMENT_MS, it prevents clicks which are too close to a movement (not the other way around)

OK, thanks :)

@dbuezas
Copy link
Contributor Author

dbuezas commented Jan 27, 2024

@XDA-Bam could you make a video where you show the phenomenon?
Please also include slowly going multiple steps in the same direction too.
Thank you!

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Jan 28, 2024

Video is attached. You will see that this time, it took me quite some time (~25 s) to trigger the bug while scrolling the menus. After that, it stayed and affected all direction changes. In other occasions, the bug was triggered sooner. I'm not sure if I also encountered instances where it was triggered immediately after booting the printer, but there might have been some.

2024-01-28_Encoder.Bug.mp4

@dbuezas
Copy link
Contributor Author

dbuezas commented Jan 29, 2024

I'll keep trying to reproduce this for some days.
I'm starting to suspect that in that type of encoder, internal haptic dents can slide a bit with respect to the encoder lines, resulting in the encoder landing exactly 1 step offset. The zero position is reset upon restarting, so this explains the bug.

If I can't reproduce it this week, I'll assume this is the case and I'll re-add the reset trick to this branch, but combined with the debouncer.

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Jan 29, 2024

I'll keep trying to reproduce this for some days. I'm starting to suspect that in that type of encoder, internal haptic dents can slide a bit with respect to the encoder lines, resulting in the encoder landing exactly 1 step offset. The zero position is reset upon restarting, so this explains the bug.

It's possible. I just managed to "un-trigger" the bug again after being triggered at start-up. Just scrolled around fast for 15 seconds, and bam: Menu moves and encoder steps were aligned correctly again, no lost moves after direction changes. A bit later, the bug re-triggered. So it can fix itself without power cycling the printer.

However, I played around with different fixes assuming exactly this problem (single pulse offset) on your previous PR and could never prevent the bug from appearing. But I might have "fixed" it incorrectly, of course 😅

I remember that I zeroed in on

int8_t fullSteps = encoderDiff / epps;

because for encoders with multiple pulses per step, we're effectively rounding towards zero here if the encoder is misaligned. In such a case, we're losing up to epps-1 pulses and are not updating encoderDiff and encoderPosition correctly in the following couple of lines.

@dbuezas dbuezas force-pushed the dbuezas/encoder-improvements-v2 branch 2 times, most recently from 9db30f9 to 9bf1887 Compare January 30, 2024 20:27
@dbuezas
Copy link
Contributor Author

dbuezas commented Jan 30, 2024

I committed a fix to this PR would you please give it another go @XDA-Bam ?


I just managed to "un-trigger" the bug again

All right! this confirms it then.

In such a case, we're losing up to epps-1 pulses and are not updating encoderDiff and encoderPosition correctly

Exactly!

At first it works

physical dent x x x
direction -> -> -> <- <- <- <- <-
encoderDiff 0 1 2 3 0 -1 -2 -3 0
encoderPosition 0 1 0 <--- works fine

Then it slips and we land here

physical dent x x x
direction -> -> -> <- <- <- <- <-
encoderDiff -1 0 1 2 3 2 1 0 -1
encoderPosition 0 0 0 0 <--- going back and forth does nothing

The fix candidate

If the encoderDiff is still for >500ms, reset it to zero.
It is a lot simpler than the previous logic, and it shouldn't have the side effect of desynchronising the physical dents with the encoder position all the time (noticeable on encoders with 2 steps per dent)

physical dent x x x x
direction RESET -> -> -> <- <- <- <- <-
encoderDiff -1 0 1 2 3 0 -1 -2 -3 0
encoderPosition 0 0 1 0 <-- fixed

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Jan 31, 2024

@dbuezas With your latest changes, the bug has changed and it subjectively appears to trigger less often. What happens now is that you can still get occasions where one step is lost after a direction change. You can then go one encoder dent left->right->left->right and so on without the menu reacting. But as soon as you move two encoder dents in either direction, the bug disappears and the menu and encoder movements are synced again. The encoder then continues to function as expected until the bug triggers again, which can then be fixed again by moving two dents in either direction.


I also tried debugging this a bit and tested the following code before your latest commit (replaces this):

  #endif // ENCODER_RATE_MULTIPLIER

  int8_t fullSteps = encoderDiff / epps;
  int8_t encoderError = encoderDiff - fullSteps*epps;
  if (encoderError != 0) {
    if (encoderDiff < 0) {
      NOMORE(fullSteps, -1);
    }
    else {
      NOLESS(fullSteps, 1);
    }
  } 
  
  if (fullSteps != 0) {
    next_encoder_enable_ms = ms + BLOCK_CLICK_AFTER_MOVEMENT_MS;
    encoderDiff = encoderError != 0 ? 0 : encoderError;
    if (can_encode() && !lcd_clicked)
      encoderPosition += (fullSteps * encoderMultiplier);
    }
  }

This behaves very similar to the current code in this PR: I still get occasions where a single encoder dent is ignored on direction changes, but the bug fixes itself if I move a couple of dents in one direction. My conclusion from this would be, that encoderDiff is actually zero in cases where a rotation by one dent doesn't result in a move in the menu, because otherwise the above code should force abs(fullSteps)>0. I find that very odd, but I'm also not sure I fully understood the overall encoder code correctly.

@dbuezas
Copy link
Contributor Author

dbuezas commented Jan 31, 2024

I see. I didn't think of that.

But as soon as you move two encoder dents in either direction, the bug disappears

It should disappear as soon as the encoder is left alone for half a second.

Anyway, this explains why the old code had some extra condition in which a full step would be taken if the encoder stayed still for a bit beyond half a full step. The issue I have with the old code is that it assumes the frequency at which the function is called, so it is kind of brittle.

I'll try reading the original workaround soon, maybe it coexists fine with my debouncing logic and we can finally get all kinks ironed out :)

Thanks for your tests and feedback @XDA-Bam !

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Jan 31, 2024

It should disappear as soon as the encoder is left alone for half a second.

Forgot to test that earlier. Just checked and yes: The bug is also cleared once you leave the encoder "resting" for a second or so 👍

That also explains why it's much harder to trigger the bug after the latest commit: The half second reset will typically clear the bug before you even notice it's happened.

@dbuezas
Copy link
Contributor Author

dbuezas commented Jan 31, 2024

Exactly!
You can also reduce the resting time to 250ms, if that's perfect already maybe we don't need to re-add the old (more complicated) code.

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Jan 31, 2024

I tried the suggested #define RESET_ENCODER_AFTER_MS 250 and didn't feel a substantial difference. It may be quicker to clear the error, but it's closing in on a level where it becomes difficult to tell without motor-driven test setups and high-speed cams 😉 But I also saw no negative effects, so I would tend to keep the shorter reset time.

What is (still?) there is the occasional lost menu step when entering submenus. If I enter "Configuration" from the main menu, the first movement / encoder dent in that submenu is sometimes ignored. I think that was also present in the previous encoder improvement PR. Is there perhaps some deadtime hardcoded in some random location if lcd_clicked gets triggered?

@dbuezas
Copy link
Contributor Author

dbuezas commented Jan 31, 2024

Ok, then we're getting somewhere:)
I'll spend some time looking at the old logic and the new one and push some new code this week.

Is there perhaps some deadtime hardcoded in some random location

Yes, and it bothers me too!
There's a deadtime for all inputs, which makes sense for the click and button/touch based navigation, but not for the encoder. I'm actually using a modded version of this branch that eliminates that. I'll work on that after this pr :)

@dbuezas
Copy link
Contributor Author

dbuezas commented Feb 1, 2024

I tried the old reset algorithm and it makes my encoder (2 pulses per step) feel all mushy. If I move the wheel one step at normal speeds, it will often reset ecoderDiff midway, resulting in an unresponsive UI.

The old code resets every 100ms or less, so any speed rate of less than 10 pulses per second results in no UI response.
10 pulses per second are 5 steps/s in my encoder and 2.5 in yours, which explains why it makes my setup worse while yours is ok. Buuuut it also means the reset timeout should be measured per step and not per pulse!

I think a good balance is to have the reset timeout be 1/pulses_per_step seconds (250ms in your setup and 500ms in mine). Also to keep the old logic that would advance a step instead of resetting when encoderDiff = ±3

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Mar 14, 2024

Found the problem: One encoder line needs to be "low biased", if you will. Switching

if (btn_b_counter >= THRESHOLD) enc.b = 1;
else if (btn_b_counter <= -THRESHOLD) enc.b = 0;

to

if (btn_b_counter <= -THRESHOLD) enc.b = 0;
else if (btn_b_counter >= THRESHOLD) enc.b = 1;

fixes the lost steps on direction changes. More testing to follow.

@dbuezas
Copy link
Contributor Author

dbuezas commented Mar 14, 2024

I think it is just that the machine was booted while the encoder was sitting between steps. The two versions are exactly equivalent (the conditions in the if and the else can't be both true at the same time)

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Mar 14, 2024

I think it is just that the machine was booted while the encoder was sitting between steps. The two versions are exactly equivalent (the conditions in the if and the else can't be both true at the same time)

Yeah, just flashed the version without those two lines switched and now the problem is also not present. So rebooting did the job. Sorry for the confusion.

@dbuezas
Copy link
Contributor Author

dbuezas commented Mar 14, 2024

I'll hold my fingers crossed that you don't get any double stepping anymore.
Regarding the max speed, I'm confident it's either stability or speed. On the bright side, the stability that debouncing brings means that the 10x and 100x encoder multiplier will work well (it won't be randomly stoped by a backward setep).

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Mar 14, 2024

OK, I tested 2 ms and 1 ms of debounce time. Didn't notice an immediate difference between the two. Maybe very fast scrolling is somewhat better at 1 ms. Also:

  • Occasional lost steps at slow-to-medium and medium scrolling speed
  • Not 100% sure I'm getting no double stepping anymore at medium speed, but I didn't notice a clear case of doubling, yet
  • After moving around the menus for about two minutes, the encoder started to lose steps after a direction change (again 😐 )

@dbuezas
Copy link
Contributor Author

dbuezas commented Mar 14, 2024

Interesting. I have some follow up questions:

Occasional lost steps at slow-to-medium and medium scrolling speed

More or less than before (i.e older version of this PR and outside of the PR)?

didn't notice a clear case of doubling yet

This sounds good, It's not something we can 100% solve. It sound like it has improved, right?

lose steps after a direction change

I really don't see how the latest changes could impact that, do you believe it never happened before the last changes?

Thanks for all the testing @XDA-Bam !

@dbuezas dbuezas force-pushed the dbuezas/encoder-improvements-v2 branch from fa91f1a to 41c8892 Compare March 14, 2024 21:30
@XDA-Bam
Copy link
Contributor

XDA-Bam commented Mar 14, 2024

Interesting. I have some follow up questions:

Occasional lost steps at slow-to-medium and medium scrolling speed

More or less than before (i.e older version of this PR and outside of the PR)?

I'd say a bit more common than before the commits from yesterday/today. But with significant uncertainty on that comparison. It's not a huge difference. As written before, I also prefer some lost steps to double stepping, as it feels more predictable.

didn't notice a clear case of doubling yet

This sounds good, It's not something we can 100% solve. It sound like it has improved, right?

Maybe. Can't say for sure, sorry. I think that double stepping is overall hard to tell with my encoder, because the indents are a bit mushy and the "hills" between indents are pretty stable resting points, too. This muddies the counting of steps by feel.

lose steps after a direction change

I really don't see how the latest changes could impact that, do you believe it never happened before the last changes?

We definitely fixed that a couple of weeks ago and I have used the printer extensively in the meantime. I'm fairly confident it reappeared today (or yesterday, I didn't test the latest changes before switching to counter-based filtering).

Thanks for all the testing @XDA-Bam !

Happy to help. Thanks for tenaciously hunting all those bugs which often don't even affect your own machine(s)!

static int dt_us = 500; // the time delta in us between each run of this function
if (counter == 1000){
static millis_t last_ms;
dt_us = (now-last_ms);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming suggests dt_us is in µs, but the right side of the equation should result in ms values. Is this correct or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this counts the time in ms per 1000 calls. I do this to get a microsecond accuracy average.
So it is 1000 times too many ms, but the right value in microseconds. I do this to avoid both calling micros() and doing divisions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks.

@dbuezas
Copy link
Contributor Author

dbuezas commented Mar 16, 2024

I made some tests and the frequency at which the function is called is highly variable. I think makes my approach of getting an average less stable

@dbuezas dbuezas force-pushed the dbuezas/encoder-improvements-v2 branch from 41c8892 to 0467ea4 Compare March 16, 2024 15:17
@dbuezas
Copy link
Contributor Author

dbuezas commented Mar 16, 2024

And the problems are worse while printing. I further tested the combined a-b debouncing by @thinkyhead and it is the one working best for me while printing.
I have reverted my recent changes and left it all as it was, I think it is the version with the best chances of getting this MR merged.

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Mar 17, 2024

That's too bad, I really liked the filtering and counter approaches. I wish there was a less costly way to get sub-ms timing precision.

I just retested the combined a-b debouncing by thinkyhead (not the current state of the PR) with ENCODER_DEBOUNCE_MS 1 and 2 and it works okay-ish. Still occasional double stepping and some lost steps. Fast scrolling is ok with 1 and 2 ms, very fast scrolling is broken with 2 ms debounce.

I wondered why combined debouncing still works decently at all. After looking at the simulated results based on my last capture, I'd conclude that the performance heavily depends on the average data rate of of the function call. With a data rate of 12 kHz and 2 ms debounce, it looks like this (raw signal in light gray):

Per channel (12 kHz)
Plot_filtered_deadtime (per channel)_12kHz

Combined (12 kHz)
Plot_filtered_deadtime (combined)_12kHz

And with 120 kHz, it looks like this:

Per channel (120 kHz)
Plot_filtered_deadtime (per channel)_120kHz

Combined (120 kHz)
Plot_filtered_deadtime (combined)_120kHz

The assumed truth in the form of the output of the original floating point lowpass:
Plot_filtered_fp lowpass_120kHz

It looks like for this specific dataset the combined debouncing is better at the lower data rate, while the per-channel debouncing is closing in and maybe even taking the lead at the higher data rate. However, both kinda suck overall at 120 kHz (=higher data rate), at least with 2 ms debounce, which may negatively affect fast boards 🤔

@dbuezas
Copy link
Contributor Author

dbuezas commented Mar 17, 2024

I just retested the combined a-b debouncing by thinkyhead (not the current state of the PR)

Just to be clear, I reverted the changes, and a-b debouncing IS the state of the PR.

I also had high hopes, particularly for the counter approach since it worked so well when idling and was so cheap to compute.

But hey, we're threading really fine here, the difference any variant makes against the bugfix branch is so massive on noisy encoders that I don't want to delay it longer than necessary :)

Cool analysis btw

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Mar 17, 2024

I just retested the combined a-b debouncing by thinkyhead (not the current state of the PR)

Just to be clear, I reverted the changes, and a-b debouncing IS the state of the PR.

OK, I think i misunderstood what "combined debouncing" was intended to say. The way I understand the current code is that channels a and b are handled seperately and that each change in channel a blocks the reading/changing of only that channel for 2 ms. So, the channels are debounced independently (called "per channel" in my plots).

"Combined" in my plots from today means that any change in either channel blocks reading/changing both channels for 2 ms.

@dbuezas
Copy link
Contributor Author

dbuezas commented Mar 17, 2024

Oh, I see, my mistake. I intended to revert to the combined approach, but made a mistake and was 1 commit behind (added the missing commit now)

@thinkyhead
Copy link
Member

I intended to revert to the combined approach

Now reverted. The main question left is whether this improves on current code without introducing new side-effects. How is it looking in the blind user test?

@dbuezas
Copy link
Contributor Author

dbuezas commented Apr 15, 2024

@XDA-Bam reported he was somehow able to trigger the missing step when changing directions. I cannot explain how, as this code shouldn't be able of do that.
If that problem were to persist and we can't find why, we could always readd the "reset residual pulses after a second of inactivity" logic.

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Apr 15, 2024

@XDA-Bam reported he was somehow able to trigger the missing step when changing directions. I cannot explain how, as this code shouldn't be able of do that. If that problem were to persist and we can't find why, we could always readd the "reset residual pulses after a second of inactivity" logic.

As far as I remember (and could see from a quick glance at our more recent posts), the lost steps on direction changes affected the last counter-based approach. But due to reverting to simple dead time debouncing, these test results don't apply to the curent state of the PR. I'll try to re-test the curent code soon.

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Apr 16, 2024

OK, the current state of the PR on my CR10_STOCKDISPLAY is as follows:

Debounce Single steps, alternating directions 3-step slow moves Fast moves High speed moves
2 ms OK Some double stepping Some missed steps Many steps missed

I couldn't provoke any lost steps after direction changes. There's also no lost steps shortly after entering a submenu.

As far as I remember, per-channel debouncing gave better results for fast and especially high speed moves on my encoder.

@dbuezas
Copy link
Contributor Author

dbuezas commented Apr 16, 2024

Ok, fantastic.
The current head of the branch HAS the per channel debouncing

@dbuezas
Copy link
Contributor Author

dbuezas commented Apr 16, 2024

The joint channel debouncing may remove some double steps probably at the cost of more missed steps at high speed rotations

@XDA-Bam
Copy link
Contributor

XDA-Bam commented Apr 16, 2024

Ahh, I didn't even check the code. Just compiled & flashed 😅 Doesn't change my results table, of course.

@dbuezas
Copy link
Contributor Author

dbuezas commented Apr 16, 2024

You mentioned it in the past, it's really hard to distinguish by eye. More so weeks later! :)

@thinkyhead
Copy link
Member

The next release (2.1.3) is about two weeks off. I'll merge this now, giving us enough time to refine it over the coming fortnight.

@thinkyhead thinkyhead merged commit a3960df into MarlinFirmware:bugfix-2.1.x May 9, 2024
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: LCD & Controllers Needs: Testing Testing is needed for this change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants