-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2.0.x] Misc fixes to Planner and Stepper #11098
Conversation
Tried this. But it breaks sensorless homing on my CoreXY. The axis are crashing in the endposition |
I tried it, and it fixed my jerky extruder motion when the acceleration was set to a low value (referenced in #11101) |
@ejtagle sorry I didn't test this sooner been distracted, the host jogging bug is still there, doesn't seem to matter how long the individual moves are (as long as they don't finish before the second command is received) if I spam the jog button you still see the same instant velocity change I've showed you, complete deceleration followed by cruise feed rate. It's pretty much entirely reproducible so I'm not sure its a race condition. |
As already pointed out many times, commit messages should be much shorter. Please see the developer guidelines on how to compose commit messages which applies to this and most other GitHub projects. |
Split out configuration and LPC1768 changes and merged them separately. |
The various drivers now have at minimum 3 hardware dependent options in /**
* Driver support
*
* Drivers control the stepper motor. The printer control boards
* generates the steps that are translated by the drivers into
* currents to operate the individual stepper motors.
*
* As the different drivers require different timing settings, the driver
* type needs to be set correctly.
* Ideally all drivers on a control board should be of the same type. Mixed
* configurations need to comply to the slowest timing values.
*
* Following drivers are supported (fastest timing first):
* TMC2xxx, A4988, LV8729, DRV8825, TB6600, TB6560
*/
#define DRIVER_TYPE TMC2xxx The required parameters are then set in On a side note: |
@Sineos ... Not a bad idea, even there have been people that need to configure those timing parameters even slower than what it should be needed according to the respective datasheets to get reliable printing... @thinkyhead ... Sorry about the message. This PR is more like a work-in-progress thing.. I really wanted your review on the endstops code change... 👍 @p3p : Right now, I assume it is not an unhandled race condition... So it must be concluded the sudden acceleration is a bug of the code itself... Maybe if we find the root cause, we will (finally!) solve the layer shift issues... Guess i will have to install somehow Pronterface and use 2 serial ports to send commands and get logging results... |
@ejtagle as I told it breaks sensorless homing |
Maybe a |
@smoki3 : Don´t worry, this PR still has some issues and needs some work... For some people it fixes things, for sensorless homing, seems it does worsen things... |
@smoki3 : Are you using ENDSTOP_FILTER ? ... Because i do suspect the pulse that is being sent by the TMC21xx drivers when sensorless homing is being used is too short and is being filtered out. Can you disable |
@ejtagle It is already disabled 👍 |
@p3p: Maybe you could help me here: I have installed Pronterface (Windows install), using an Arduino Due, connected through the Serial Programming Port, set the acceleration to the minimum (100mm/s^2) on X and Y axis, press on the jog control, and it works. No sudden accelerations. It does not matter how fast I press into the X axis, it clearly accelerates, moves and then deaccelerates.... |
Well, now that i read carefully, it is not pronterface... It is Octoprint the one that shows the problem... Let´s try to setup it and see if we are able to trigger the problem... |
Possibly related on driver types (coming soon): Though not about timings but the TMC driver selection. |
@ejtagle Reduced my feedrate and acceleration to the values specified and although it is a little harder to trigger the issue repeatedly (I guess multiple currently planned moves helps) it is still repeatable between the first 2, I can send the same gcode Octoprint creates on the events in a continuous block and have no issue though.
|
@p3p: After installing OctoPrint, yes, it is mostly terrible. I set on purpose a high feedrate an a very very low acceleration of 100mm/s^2 and it is perfectly reproducible. I don´t need a logic analyzer. Just the noise it does is truly terrible... So, now I´ll have to figure out how to dump to a different serial port while OctoPrint has taken ownership of the main port to send commands... |
Indeed it is, I didn't want to be mean to my printer so I've been using an MKS SBase I have on my desk and a logic analyser instead, at least it gives a clear picture of whats happening rather than "arg my printer exploded" 😉 |
Well... Everything points to the planner ...
This is the Octoprint log, but i modified Marlin to dump the block that will be executed by the planner. The 3rd block shows a final rate of 120, and the 4th block shows an initial_rate of 17528 So, Stepper discarded. The problem is in the planner |
Well now I can watch velocity data (python script reading logic analyser data stream) in real-time while I "print" I definitely have new found appreciation for step smoothing, I was looking for any obvious anomalies for the step loss issue and got distracted by the odd patterns I was seeing. |
I can confirm it is the planner the culprit of the sudden acceleration changes: I added even more logging:
PL: Is when a block is to be inserted into the motion queue, before the recalculate() planner pass As you see, after the recalculate() pass, there is a block with initial_rate:17528, when the previous one has final_rate:120. Now the main question is why... ;) |
Exactly like this. |
@p3p, @thinkyhead : I think i finally found the problem of the sudden acceleration. This comes from a long time ago. It´s the way we calculate max_entry_speed_sqr of each block, inherently associated to the way Jerk values are calculated. The problem is that Jerk (either the "classic" approach or the new approach assume that we always can chain blocks with the previous one. And that is simply not true! - If the previous block is busy or non existant, it can´t be chained to this one - There is a condition moves_queued, but that is not enough, as when there are 2 movements, and the first movement is BUSY, and the 2nd is queued, then it is impossible to merge movements... I am thinking on how to solve this problem... It´s a new race condition between the Jerk computations and the stepper ISR... And, mostly probably, forcing a planner.sync() between each layer could trigger the "sudden acceleration" bug and easily cause layer shifts... |
@Sineos may I suggest you a driver selection system just like the board-pins one? A file for each type of driver (it may turn out to be useful if in future we'll add other driver-dependent options in the firmware, to keep them more organised); then a Drivers.h file with a list of macros which points to those files... Let's say something like this: Configuration.h //Select from Drivers.h the slowest driver you have installed on your board
#define SLOWEST_DRIVER_INSTALLED DRIVER_DRV8825 Drivers.h #define DRIVER_DRV8825 ./drivers/driver_DRV8825.h
#define DRIVER_A4988 ./drivers/driver_A4988.h
#define DRIVER_TMC2130 ./drivers/driver_TMC2130
[...] driver_DRV8825.h //Every driver-specific options
#define MINIMUM_STEP_PULSE blabla
#define blabla blabla Then we can load the driver's options like this: Wouldn't this bring to a much cleaner code, or am I over-complicating? Take this just as a little suggestion. Anyway, I feel like we should definitely talk about this in another issue... |
That should have been ensured in the past, for non-busy blocks at least. Except for BUSY blocks, which other blocks can/should not be allowed to have their acceleration/deceleration updated? |
I would like to better understand the endstops changes here. What we ultimately want is:
It looks like this PR might be removing that last item. |
@thinkyhead : Lets go item by item. All those are hard to follow and understand. First, for the "sudden acceleration" issues: I did isolate and take out the motion planner and compiled a separate project with it, to be able to understand how it works - And i used the logs i attached to reproduce the problem. It goes more or less like this: If you see from the Planner point of view, if the block pointed by the tail is BUSY (owned by Stepper), then that block must be considered readonly. If the block is not owned yet, then the block can be modified by the Planner. So, the first problem is in the Planner::_populate_block() method, that uses a call to movesplanned() to determine if the block->max_entry_speed_sqr is 0 (when no movements were queued) or a different value calculated using either JUNCTION_DEVIATION or the previous jerk algorithm But using movesplanned() != 0 as a condition assumes that either no block was executing (so it is correct to limit the maximum junction speed (with a non existant block) to 0 (as if there wasn´t a previous block, then the motors are stopped, as the planner ensures the last queued block always ends in 0 speed) But now, if movesplanned()>0, then the code assumes the maximum allowable junction speed can be determined by the JUNCTION_DEVIATION or the previous algorithm, and that IS IGNORING the fact that maybe the tail block is BUSY, thus its exit junction speed is READ ONLY and can´t be altered. So, what happens next is that the planner is allowed to replan this new block freely, specifically its entry speed, and that is simply not correct at all, as the previous block (the BUSY one) has a very defined exit speed that can't be altered To add to the difficulty of handling this condition, the tail block could become BUSY while the planner is running. I have implemented a race protection prevention but, nothing is enforcing the following block entry speed to respect the exit speed of the BUSY block. And the BUSY block can change as we plan! ... How did it work previously ? : It was also buggy, i am afraid. Previously, the forward pass was ignoring the first 2 blocks (the tail and the next to the tail one) - And that is completely incorrect, as it was shown that due to that, planning of blocks was also erratic and caused acceleration spikes and jerk was not respected (i could try to find the thread... @Sebastianv650 did all the investigation) So, the answer seems to enforce entry speed to the exit speed of the previous (closer to be executed) block if that block becomes busy. That is what i am working on right now. And also make sure Planner::_populate_block() enforces the entry speed of a block to be the exit speed of the busy block, is there is just one queued block and it happens to be the busy block Fortunately, when more than 2 blocks are queued, none of this logic is required, that its why it "seems" to work in mostly all cases, but not all of them. |
@thinkyhead : Regarding the endstops problems, that is why i wanted your review. Basically, if you closely look at the modifications i did to the endstops code, in the Endstops::update() function, everything above #define TEST_ENDSTOP(ENDSTOP) (TEST(state(), ENDSTOP)) Should only be updating the All the updates to the The periodic update of But, obviously, something is escaping me.. Probably with CORE printers, as @smoki3 reports it breaks sensorless homing even with ENDSTOP_NOISE_FILTER disabled, but there was a report on this PR fixing G28/G29/G30 (but i do suspect on a nonCore nonDelta printer)... |
@ejtagle edit: ignore that, I misinterpreted the data, always remember when you add a filter or it can confuse you, actual data shows its the jerk setting and the frequency interplay noise. |
@p3p : I could be wrong, but i do suspect the acceleration spikes you are seeing are caused by the extra processing time the stepper ISR requires to setup a new movement block and start executing it. The ISR has a loop that will recover the used time (catch up) by accelerated execution of the ISR body. |
Some speculative questions based on your notes…
Would it be useful to have a
Did we previously use a
I was under the assumption that every block's entry speed is automatically matched up to the exit speed of the block ahead of it, and we could almost always count on having at least two blocks to work with (unless the planner gets starved, in which case we're obligated to permit blocks to decelerate to the minimum jerk speed). How did that get broken? |
GitHub formatting tip: Only a single backtick should be used for inline monospace formatting. Three backticks with optional file-format specifier for multi-line comments… int i = 10; See https://guides.github.com/features/mastering-markdown/ for more markdown help. |
A separate PR would be better. So that they can be evaluated separately and the first one that's ready can be merged while the other may still be under development. |
@thinkyhead There was never a critical section protecting the planner. It is just too much time... At some point, when the first movement was splitted in 2 parts and then queued, i seem to recall all the planner running with interrupts disabled - As said previously, this partly fixes it, but does not handle all possibilities Your last assumtion still holds exactly as you state it. Completely agree into splitting this into 2 PRs, one for the planner, the other for the Endstops |
Yes, and I believe that I already fixed that last week. When endstop filtering is enabled it is periodically updated from the temperature ISR.
Are they? I just got back from ERRF and have over 100 issues in the queue at the moment, so I haven't studied the changes closely, but I will look closer when the fog clears.
Yep. This code is brittle, so it's a bit reckless for us to keep uprooting it. |
Well, seems to be fixed. Octoprint moves smoothly, and I did a full print and no issues or strange noises at all... 👍 |
This commit message is too long and uses the wrong tense. Grr…
Fixing, squashing, and rebasing. |
Good to know. I will shortly break out the endstop changes into a separate PR. |
Now i realise you are complaining about the commit messages, not the PR message.... 👍 |
Long commit messages get cut off in git tool interfaces. The long description is the appropriate place for the longer description, and git tools will show the whole thing, even including Markdown formatting and links. |
Let's hope this one fixes #10446 ... finally ! 👍 |
- Allow planner to alter the deceleration phase of the currently executing block. - Remove BUSY flag, as it is NON ATOMIC to set bits in the Stepper ISR and Planner at the same time.
Can somehow explain me the fix in the first place? i didn't checked every change. variable flag of Isn't there still a race condition or am I just overlooking something? possible fail on my side in my theoretical model:
SBI(current->flag, BLOCK_BIT_RECALCULATE);
// But there is an inherent race condition here, as the block may have
// become BUSY just before being marked RECALCULATE, so check for that!
if (stepper.is_block_busy(current)) {
// Block became busy. Clear the RECALCULATE flag (no point in
// recalculating BUSY blocks). And don't set its speed, as it can't
// be updated at this time.
CBI(current->flag, BLOCK_BIT_RECALCULATE); what could (in my optionion) happen in assembler
what am i overlooking here? |
@Sickeroni, @ejtagle — Can I assume we resolved this? |
@thinkyhead : Yes, we resolved it CBI/SBI are "safe" to use in To determine if a block is " Disabling interrupts adds extra latency to the already critical stepper ISR, it is a NO NO, and a lot of pain went into removing those unneeded critical sections |
I initially implemented the ability to alter a running block, but found a much bigger problem there:
block_t.flags
from the ISR and main thread at the same time !!! ... This is a VERY BIG mistake, as the main thread could be removing the BUSY bit from the block, while trying to set any other bit, if it happens the Stepper ISR executes in the middle of the read-modify-write operation! - So, as this is not acceptable at all, the BUSY bit was removed and a different approach is used.So the solution was to detect the busy block by comparing
Stepper::current_block
with the one we want, to, after setting RECALCULATE for a block (that should prevent the ISR to start executing the block if it hasn't already) check if the block is busy, and if it is, do not touch it at all.As @p3p found, correct the LPC1768 Serial priority. This seems to partially fix issues in #11047