-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New encoder logic with debouncing #26723
New encoder logic with debouncing #26723
Conversation
BeforeVery jumpy, often goes backwards. WhatsApp.Video.2024-01-23.at.22.52.57.mp4AfterDirection and responsiveness are fixed. WhatsApp.Video.2024-01-23.at.22.58.01.mp4 |
Note: I avoided using interrupts or any type of counting to cover slower boards with less interrupt pins. The changes increase RAM usage by 14 bytes as static vars:
|
I just put a call out in our #testing channel on Discord to collect feedback on this PR. |
Testing this with |
32f0146
to
3856037
Compare
Thanks for the test @The-EG, |
32bit. I didn't have pronounced issues in the first place, so I think the improvement is less noticeable for me, but the biggest thing I noticed is when making a large change (ie. spinning the encoder quickly), the reaction isn't as jumpy and feels more in line with the motion. I can't say I noticed any issues with reversing, either before or after. I'll see if I can try this on my |
9c65146
to
4f65466
Compare
Sadly the same problem as with #26501 for me: The encoder loses steps when changing direction. When changing the scrolling direction, you need to go two encoder detents for the menu to actually move the first item up/down with this PR applied. Doesn't always happen, but once the bug is triggered, it seems to stay until the printer is powered down. Setup: BTT E3 Mini SKR v1.2 with Unrelated: Am I reading the code wrong, or should |
That's a bummer, but it means that it is not just a lost step. There must be something else going on. @XDA-Bam I have a couple questions you could help me with:
Maybe I can reproduce this phenomenon in my machine by matching your config. Regarding BLOCK_CLICK_AFTER_MOVEMENT_MS, it prevents clicks which are too close to a movement (not the other way around) |
Yes, both.
I think so, yes. More consistent overall.
Attached below.
OK, thanks :) |
@XDA-Bam could you make a video where you show the phenomenon? |
Video is attached. You will see that this time, it took me quite some time (~25 s) to trigger the bug while scrolling the menus. After that, it stayed and affected all direction changes. In other occasions, the bug was triggered sooner. I'm not sure if I also encountered instances where it was triggered immediately after booting the printer, but there might have been some. 2024-01-28_Encoder.Bug.mp4 |
I'll keep trying to reproduce this for some days. If I can't reproduce it this week, I'll assume this is the case and I'll re-add the reset trick to this branch, but combined with the debouncer. |
It's possible. I just managed to "un-trigger" the bug again after being triggered at start-up. Just scrolled around fast for 15 seconds, and bam: Menu moves and encoder steps were aligned correctly again, no lost moves after direction changes. A bit later, the bug re-triggered. So it can fix itself without power cycling the printer. However, I played around with different fixes assuming exactly this problem (single pulse offset) on your previous PR and could never prevent the bug from appearing. But I might have "fixed" it incorrectly, of course 😅 I remember that I zeroed in on Marlin/Marlin/src/lcd/marlinui.cpp Line 1105 in bb36767
because for encoders with multiple pulses per step, we're effectively rounding towards zero here if the encoder is misaligned. In such a case, we're losing up to epps-1 pulses and are not updating encoderDiff and encoderPosition correctly in the following couple of lines.
|
9db30f9
to
9bf1887
Compare
I committed a fix to this PR would you please give it another go @XDA-Bam ?
All right! this confirms it then.
Exactly! At first it works
Then it slips and we land here
The fix candidateIf the encoderDiff is still for >500ms, reset it to zero.
|
@dbuezas With your latest changes, the bug has changed and it subjectively appears to trigger less often. What happens now is that you can still get occasions where one step is lost after a direction change. You can then go one encoder dent I also tried debugging this a bit and tested the following code before your latest commit (replaces this): #endif // ENCODER_RATE_MULTIPLIER
int8_t fullSteps = encoderDiff / epps;
int8_t encoderError = encoderDiff - fullSteps*epps;
if (encoderError != 0) {
if (encoderDiff < 0) {
NOMORE(fullSteps, -1);
}
else {
NOLESS(fullSteps, 1);
}
}
if (fullSteps != 0) {
next_encoder_enable_ms = ms + BLOCK_CLICK_AFTER_MOVEMENT_MS;
encoderDiff = encoderError != 0 ? 0 : encoderError;
if (can_encode() && !lcd_clicked)
encoderPosition += (fullSteps * encoderMultiplier);
}
} This behaves very similar to the current code in this PR: I still get occasions where a single encoder dent is ignored on direction changes, but the bug fixes itself if I move a couple of dents in one direction. My conclusion from this would be, that |
I see. I didn't think of that.
It should disappear as soon as the encoder is left alone for half a second. Anyway, this explains why the old code had some extra condition in which a full step would be taken if the encoder stayed still for a bit beyond half a full step. The issue I have with the old code is that it assumes the frequency at which the function is called, so it is kind of brittle. I'll try reading the original workaround soon, maybe it coexists fine with my debouncing logic and we can finally get all kinks ironed out :) Thanks for your tests and feedback @XDA-Bam ! |
Forgot to test that earlier. Just checked and yes: The bug is also cleared once you leave the encoder "resting" for a second or so 👍 That also explains why it's much harder to trigger the bug after the latest commit: The half second reset will typically clear the bug before you even notice it's happened. |
Exactly! |
I tried the suggested What is (still?) there is the occasional lost menu step when entering submenus. If I enter "Configuration" from the main menu, the first movement / encoder dent in that submenu is sometimes ignored. I think that was also present in the previous encoder improvement PR. Is there perhaps some deadtime hardcoded in some random location if |
Ok, then we're getting somewhere:)
Yes, and it bothers me too! |
I tried the old reset algorithm and it makes my encoder (2 pulses per step) feel all mushy. If I move the wheel one step at normal speeds, it will often reset ecoderDiff midway, resulting in an unresponsive UI. The old code resets every 100ms or less, so any speed rate of less than 10 pulses per second results in no UI response. I think a good balance is to have the reset timeout be 1/pulses_per_step seconds (250ms in your setup and 500ms in mine). Also to keep the old logic that would advance a step instead of resetting when encoderDiff = ±3 |
Found the problem: One encoder line needs to be "low biased", if you will. Switching if (btn_b_counter >= THRESHOLD) enc.b = 1;
else if (btn_b_counter <= -THRESHOLD) enc.b = 0; to if (btn_b_counter <= -THRESHOLD) enc.b = 0;
else if (btn_b_counter >= THRESHOLD) enc.b = 1; fixes the lost steps on direction changes. More testing to follow. |
I think it is just that the machine was booted while the encoder was sitting between steps. The two versions are exactly equivalent (the conditions in the if and the else can't be both true at the same time) |
Yeah, just flashed the version without those two lines switched and now the problem is also not present. So rebooting did the job. Sorry for the confusion. |
I'll hold my fingers crossed that you don't get any double stepping anymore. |
OK, I tested 2 ms and 1 ms of debounce time. Didn't notice an immediate difference between the two. Maybe very fast scrolling is somewhat better at 1 ms. Also:
|
Interesting. I have some follow up questions:
More or less than before (i.e older version of this PR and outside of the PR)?
This sounds good, It's not something we can 100% solve. It sound like it has improved, right?
I really don't see how the latest changes could impact that, do you believe it never happened before the last changes? Thanks for all the testing @XDA-Bam ! |
fa91f1a
to
41c8892
Compare
I'd say a bit more common than before the commits from yesterday/today. But with significant uncertainty on that comparison. It's not a huge difference. As written before, I also prefer some lost steps to double stepping, as it feels more predictable.
Maybe. Can't say for sure, sorry. I think that double stepping is overall hard to tell with my encoder, because the indents are a bit mushy and the "hills" between indents are pretty stable resting points, too. This muddies the counting of steps by feel.
We definitely fixed that a couple of weeks ago and I have used the printer extensively in the meantime. I'm fairly confident it reappeared today (or yesterday, I didn't test the latest changes before switching to counter-based filtering).
Happy to help. Thanks for tenaciously hunting all those bugs which often don't even affect your own machine(s)! |
Marlin/src/lcd/marlinui.cpp
Outdated
static int dt_us = 500; // the time delta in us between each run of this function | ||
if (counter == 1000){ | ||
static millis_t last_ms; | ||
dt_us = (now-last_ms); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming suggests dt_us
is in µs, but the right side of the equation should result in ms values. Is this correct or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this counts the time in ms per 1000 calls. I do this to get a microsecond accuracy average.
So it is 1000 times too many ms, but the right value in microseconds. I do this to avoid both calling micros() and doing divisions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks.
I made some tests and the frequency at which the function is called is highly variable. I think makes my approach of getting an average less stable |
41c8892
to
0467ea4
Compare
And the problems are worse while printing. I further tested the combined a-b debouncing by @thinkyhead and it is the one working best for me while printing. |
That's too bad, I really liked the filtering and counter approaches. I wish there was a less costly way to get sub-ms timing precision. I just retested the combined a-b debouncing by thinkyhead (not the current state of the PR) with I wondered why combined debouncing still works decently at all. After looking at the simulated results based on my last capture, I'd conclude that the performance heavily depends on the average data rate of of the function call. With a data rate of 12 kHz and 2 ms debounce, it looks like this (raw signal in light gray): And with 120 kHz, it looks like this: The assumed truth in the form of the output of the original floating point lowpass: It looks like for this specific dataset the combined debouncing is better at the lower data rate, while the per-channel debouncing is closing in and maybe even taking the lead at the higher data rate. However, both kinda suck overall at 120 kHz (=higher data rate), at least with 2 ms debounce, which may negatively affect fast boards 🤔 |
Just to be clear, I reverted the changes, and a-b debouncing IS the state of the PR. I also had high hopes, particularly for the counter approach since it worked so well when idling and was so cheap to compute. But hey, we're threading really fine here, the difference any variant makes against the bugfix branch is so massive on noisy encoders that I don't want to delay it longer than necessary :) Cool analysis btw |
OK, I think i misunderstood what "combined debouncing" was intended to say. The way I understand the current code is that channels "Combined" in my plots from today means that any change in either channel blocks reading/changing both channels for 2 ms. |
Oh, I see, my mistake. I intended to revert to the combined approach, but made a mistake and was 1 commit behind (added the missing commit now) |
Now reverted. The main question left is whether this improves on current code without introducing new side-effects. How is it looking in the blind user test? |
@XDA-Bam reported he was somehow able to trigger the missing step when changing directions. I cannot explain how, as this code shouldn't be able of do that. |
As far as I remember (and could see from a quick glance at our more recent posts), the lost steps on direction changes affected the last counter-based approach. But due to reverting to simple dead time debouncing, these test results don't apply to the curent state of the PR. I'll try to re-test the curent code soon. |
OK, the current state of the PR on my
I couldn't provoke any lost steps after direction changes. There's also no lost steps shortly after entering a submenu. As far as I remember, per-channel debouncing gave better results for fast and especially high speed moves on my encoder. |
Ok, fantastic. |
The joint channel debouncing may remove some double steps probably at the cost of more missed steps at high speed rotations |
Ahh, I didn't even check the code. Just compiled & flashed 😅 Doesn't change my results table, of course. |
You mentioned it in the past, it's really hard to distinguish by eye. More so weeks later! :) |
The next release (2.1.3) is about two weeks off. I'll merge this now, giving us enough time to refine it over the coming fortnight. |
Description
This PR is a follow up to #26501.
After multiple tests using interrupts, it became obvious that the main problems I observe with the encoder are related to contact bouncing.
Since other reported in #26501 that the old step skipping workarounds were needed in their machines to avoid unresponsiveness when changing direction, I conclude that the contact bouncing issue is present in many boards.
Luckily,
update_buttons
is called extremely frequently (I measure 180kHz in my SKR3) so a very simple debouncing scheme can be used without the need for interrupts.This PR:
Requirements
Any board with a rotary encoder.
Benefits
Rock solid encoder readings. No false readings and no missed steps. The encoder finally feels right.
Configurations
Related Issues
#26501
#26605