[2.0.x] Misc fixes to Planner and Stepper #11098

ejtagle · 2018-06-23T23:39:41Z

I initially implemented the ability to alter a running block, but found a much bigger problem there:

SBI and CBI macros are not atomic. so we CAN´T use CBI and SBI in block_t.flags from the ISR and main thread at the same time !!! ... This is a VERY BIG mistake, as the main thread could be removing the BUSY bit from the block, while trying to set any other bit, if it happens the Stepper ISR executes in the middle of the read-modify-write operation! - So, as this is not acceptable at all, the BUSY bit was removed and a different approach is used.
There were several unhandled race conditions between the Stepper ISR and the Planner. Specifically, the most dangerous one is when, at the planner, we set the RECALCULATE flag, but at that point the block could already be BUSY just because the Stepper ISR took it from the movement queue. Results are that we end up partially updating the entry and exit speeds of the block, but not updating the block parameters that are used by the stepper ISR. The net result is that the planner considers the block joined with the next, when it wasn't, and that causes the next block to start at maximum speed - Leading to lost steps. That is what I suspect is happening with Pronterface and the Jog problem.

So the solution was to detect the busy block by comparing Stepper::current_block with the one we want, to, after setting RECALCULATE for a block (that should prevent the ISR to start executing the block if it hasn't already) check if the block is busy, and if it is, do not touch it at all.

As @p3p found, correct the LPC1768 Serial priority. This seems to partially fix issues in #11047

smoki3 · 2018-06-25T07:45:16Z

Tried this. But it breaks sensorless homing on my CoreXY. The axis are crashing in the endposition

nebbian · 2018-06-25T07:59:00Z

I tried it, and it fixed my jerky extruder motion when the acceleration was set to a low value (referenced in #11101)

p3p · 2018-06-25T10:45:13Z

@ejtagle sorry I didn't test this sooner been distracted, the host jogging bug is still there, doesn't seem to matter how long the individual moves are (as long as they don't finish before the second command is received) if I spam the jog button you still see the same instant velocity change I've showed you, complete deceleration followed by cruise feed rate. It's pretty much entirely reproducible so I'm not sure its a race condition.

thinkyhead · 2018-06-25T15:57:51Z

As already pointed out many times, commit messages should be much shorter. Please see the developer guidelines on how to compose commit messages which applies to this and most other GitHub projects.

thinkyhead · 2018-06-25T16:27:47Z

Split out configuration and LPC1768 changes and merged them separately.
Rebased and squashed commits.

Sineos · 2018-06-25T18:00:43Z

Split out configuration and LPC1768 changes and merged them separately.

The various drivers now have at minimum 3 hardware dependent options in Configuration_adv.h. Setting them correctly now has become an integral part of Marlin's configuration. For sake of user-friendliness I would propose to move them to Configuration.h in a simplified manner:

/**
 * Driver support
 *
 * Drivers control the stepper motor. The printer control boards
 * generates the steps that are translated by the drivers  into 
 * currents to operate the individual stepper motors.
 * 
 * As the different drivers require different timing settings, the driver
 * type needs to be set correctly.
 * Ideally all drivers on a control board should be of the same type. Mixed
 * configurations need to comply to the slowest timing values.
 *
 * Following drivers are supported (fastest timing first): 
 * TMC2xxx, A4988, LV8729, DRV8825, TB6600, TB6560
 */
#define DRIVER_TYPE TMC2xxx

The required parameters are then set in Conditionals_post.h

On a side note:
I have seen printers with mixed drivers. It should somehow be made clear to the users that they need to set it to the one with the slowest timing.

ejtagle · 2018-06-25T18:11:43Z

@Sineos ... Not a bad idea, even there have been people that need to configure those timing parameters even slower than what it should be needed according to the respective datasheets to get reliable printing...

@thinkyhead ... Sorry about the message. This PR is more like a work-in-progress thing.. I really wanted your review on the endstops code change... 👍

@p3p : Right now, I assume it is not an unhandled race condition... So it must be concluded the sudden acceleration is a bug of the code itself... Maybe if we find the root cause, we will (finally!) solve the layer shift issues... Guess i will have to install somehow Pronterface and use 2 serial ports to send commands and get logging results...

smoki3 · 2018-06-25T18:17:16Z

@ejtagle as I told it breaks sensorless homing

Sineos · 2018-06-25T18:19:00Z

Not a bad idea, even there have been people that need to configure those timing parameters even slower than what it should be needed according to the respective datasheets to get reliable printing.

Maybe a custom driver could be introduced that allows to manually define the timing. Anyway, I think the settings should move from advanced to standard config, since they are a must like the board type.

ejtagle · 2018-06-25T18:25:28Z

@smoki3 : Don´t worry, this PR still has some issues and needs some work... For some people it fixes things, for sensorless homing, seems it does worsen things...

ejtagle · 2018-06-25T18:32:20Z

@smoki3 : Are you using ENDSTOP_FILTER ? ... Because i do suspect the pulse that is being sent by the TMC21xx drivers when sensorless homing is being used is too short and is being filtered out. Can you disable ENDSTOP_NOISE_FILTER and try again ?

smoki3 · 2018-06-25T18:35:19Z

@ejtagle It is already disabled 👍

ejtagle · 2018-06-25T20:24:13Z

@p3p: Maybe you could help me here: I have installed Pronterface (Windows install), using an Arduino Due, connected through the Serial Programming Port, set the acceleration to the minimum (100mm/s^2) on X and Y axis, press on the jog control, and it works. No sudden accelerations. It does not matter how fast I press into the X axis, it clearly accelerates, moves and then deaccelerates....

ejtagle · 2018-06-25T20:25:37Z

ejtagle · 2018-06-25T20:30:54Z

Well, now that i read carefully, it is not pronterface... It is Octoprint the one that shows the problem... Let´s try to setup it and see if we are able to trigger the problem...

teemuatlut · 2018-06-25T20:43:22Z

Possibly related on driver types (coming soon):
teemuatlut@00d735e#diff-18dab9d337cee2f9d2209940ed08dbacR1085

Though not about timings but the TMC driver selection.

p3p · 2018-06-25T20:55:41Z

@ejtagle Reduced my feedrate and acceleration to the values specified and although it is a little harder to trigger the issue repeatedly (I guess multiple currently planned moves helps) it is still repeatable between the first 2, I can send the same gcode Octoprint creates on the events in a continuous block and have no issue though.

Send: G91
Recv: ok
Send: G1 X10 F3000
Recv: ok
Send: G90
Recv: ok
Send: G91
Recv: ok
Send: G1 X10 F3000
Recv: ok
Send: G90
Recv: ok
Send: G91
Recv: ok
Send: G1 X10 F3000
Recv: ok
Send: G90
Recv: ok

ejtagle · 2018-06-25T21:23:34Z

@p3p: After installing OctoPrint, yes, it is mostly terrible. I set on purpose a high feedrate an a very very low acceleration of 100mm/s^2 and it is perfectly reproducible. I don´t need a logic analyzer. Just the noise it does is truly terrible...

So, now I´ll have to figure out how to dump to a different serial port while OctoPrint has taken ownership of the main port to send commands...

p3p · 2018-06-25T21:42:06Z

I don´t need a logic analyzer. Just the noise it does is truly terrible...

Indeed it is, I didn't want to be mean to my printer so I've been using an MKS SBase I have on my desk and a logic analyser instead, at least it gives a clear picture of whats happening rather than "arg my printer exploded" 😉

ejtagle · 2018-06-25T22:24:17Z

Well... Everything points to the planner ...

Send: G91
Recv: ok
Send: G1 X10 F18000
Recv: ok
Send: G90
Recv: ok
Recv: flag:0
Recv: nominal_speed_sqr:90000.00
Recv: entry_speed_sqr:0.00
Recv: max_entry_speed_sqr:0.00
Recv: millimeters:10.00
Recv: acceleration:2400.00
Recv: direction_bits:0
Recv: steps[0]:800
Recv: steps[1]:0
Recv: steps[2]:0
Recv: step_event_count:800
Recv: accelerate_until:400
Recv: decelerate_after:400
Recv: initial_rate:120
Recv: nominal_rate:24000
Recv: cruise_rate:12394
Recv: final_rate:120
Recv: acceleration_steps_per_s2:192000
Send: G91
Recv: ok
Send: G1 X10 F18000
Recv: ok
Send: G90
Recv: ok
Recv: flag:0
Recv: nominal_speed_sqr:90000.00
Recv: entry_speed_sqr:0.00
Recv: max_entry_speed_sqr:0.00
Recv: millimeters:10.00
Recv: acceleration:2400.00
Recv: direction_bits:0
Recv: steps[0]:800
Recv: steps[1]:0
Recv: steps[2]:0
Recv: step_event_count:800
Recv: accelerate_until:400
Recv: decelerate_after:400
Recv: initial_rate:120
Recv: nominal_rate:24000
Recv: cruise_rate:12394
Recv: final_rate:120
Recv: acceleration_steps_per_s2:192000
Send: G91
Recv: ok
Send: G1 X10 F18000
Recv: ok
Send: G90
Recv: ok
Recv: flag:0
Recv: nominal_speed_sqr:90000.00
Recv: entry_speed_sqr:0.00
Recv: max_entry_speed_sqr:0.00
Recv: millimeters:10.00
Recv: acceleration:2400.00
Recv: direction_bits:0
Recv: steps[0]:800
Recv: steps[1]:0
Recv: steps[2]:0
Recv: step_event_count:800
Recv: accelerate_until:400
Recv: decelerate_after:400
Recv: initial_rate:120
Recv: nominal_rate:24000
Recv: cruise_rate:12394
Recv: final_rate:120
Recv: acceleration_steps_per_s2:192000
Send: G91
Recv: ok
Send: G1 X10 F18000
Recv: ok
Send: G90
Recv: ok
Recv: flag:0
Recv: nominal_speed_sqr:90000.00
Recv: entry_speed_sqr:48000.00
Recv: max_entry_speed_sqr:90000.00
Recv: millimeters:10.00
Recv: acceleration:2400.00
Recv: direction_bits:0
Recv: steps[0]:800
Recv: steps[1]:0
Recv: steps[2]:0
Recv: step_event_count:800
Recv: accelerate_until:0
Recv: decelerate_after:0
Recv: initial_rate:17528
Recv: nominal_rate:24000
Recv: cruise_rate:17528
Recv: final_rate:120
Recv: acceleration_steps_per_s2:192000
Send: G91
Recv: ok
Send: G1 X10 F18000
Recv: ok
Send: G90
Recv: ok
Recv: flag:0
Recv: nominal_speed_sqr:90000.00
Recv: entry_speed_sqr:0.00
Recv: max_entry_speed_sqr:0.00
Recv: millimeters:10.00
Recv: acceleration:2400.00
Recv: direction_bits:0
Recv: steps[0]:800
Recv: steps[1]:0
Recv: steps[2]:0
Recv: step_event_count:800
Recv: accelerate_until:400
Recv: decelerate_after:400
Recv: initial_rate:120
Recv: nominal_rate:24000
Recv: cruise_rate:12394
Recv: final_rate:120
Recv: acceleration_steps_per_s2:192000
Recv:  T:21.25 /0.00 B:19.80 /0.00 @:0 B@:0
Recv:  T:21.33 /0.00 B:19.53 /0.00 @:0 B@:0
Send: G91
Recv: ok

This is the Octoprint log, but i modified Marlin to dump the block that will be executed by the planner. The 3rd block shows a final rate of 120, and the 4th block shows an initial_rate of 17528

So, Stepper discarded. The problem is in the planner

p3p · 2018-06-25T23:45:41Z

Well now I can watch velocity data (python script reading logic analyser data stream) in real-time while I "print" I definitely have new found appreciation for step smoothing, I was looking for any obvious anomalies for the step loss issue and got distracted by the odd patterns I was seeing.

no adaptive step smoothing

adaptive step smoothing

ejtagle · 2018-06-26T00:23:17Z

I can confirm it is the planner the culprit of the sudden acceleration changes: I added even more logging:

Recv:  T:21.45 /0.00 B:20.04 /0.00 @:0 B@:0
Send: G91
Recv: ok
Send: G1 X10 F18000
Recv: PL/initial_rate:0
Recv: PL|nominal_rate:24000
Recv: PL|cruise_rate:0
Recv: PL\final_rate:0
Recv: ML/start11
Recv: ML##index:11
Recv: ML#initial_rate:120
Recv: ML#nominal_rate:24000
Recv: ML#cruise_rate:12394
Recv: ML#final_rate:120
Recv: ML\end12
Recv: ok
Send: G90
Recv: ok
Recv: SP/initial_rate:120
Recv: SP|nominal_rate:24000
Recv: SP|cruise_rate:12394
Recv: SP\final_rate:120
Send: G91
Recv: ok
Send: G1 X10 F18000
Recv: PL/initial_rate:0
Recv: PL|nominal_rate:24000
Recv: PL|cruise_rate:0
Recv: PL\final_rate:0
Recv: ML/start11
Recv: ML##index:11
Recv: ML#initial_rate:120
Recv: ML#nominal_rate:24000
Recv: ML#cruise_rate:12394
Recv: ML#final_rate:120
Recv: ML##index:12
Recv: ML#initial_rate:17528
Recv: ML#nominal_rate:24000
Recv: ML#cruise_rate:17528
Recv: ML#final_rate:120
Recv: ML\end13
Recv: ok
Send: G90
Recv: ok
Recv: SP/initial_rate:17528
Recv: SP|nominal_rate:24000
Recv: SP|cruise_rate:17528
Recv: SP\final_rate:120
Send: G91

PL: Is when a block is to be inserted into the motion queue, before the recalculate() planner pass
ML: Is the motion queue dump of block
SP: Is the block that is taken by the stepper ISR.

As you see, after the recalculate() pass, there is a block with initial_rate:17528, when the previous one has final_rate:120. Now the main question is why... ;)

Sineos · 2018-06-26T05:38:59Z

@teemuatlut

Possibly related on driver types (coming soon):

#define  X_DRIVER_TYPE 2130 // [2130, 2208, 2660]
#define  Y_DRIVER_TYPE 2130
#define  Z_DRIVER_TYPE 2130
#define X2_DRIVER_TYPE 2130
#define Y2_DRIVER_TYPE 2130
#define Z2_DRIVER_TYPE 2130
#define E0_DRIVER_TYPE 2130
#define E1_DRIVER_TYPE 2130
#define E2_DRIVER_TYPE 2130
#define E3_DRIVER_TYPE 2130
#define E4_DRIVER_TYPE 2130

Exactly like this.
IMHO this should be extended to include all supported drivers and the timing settings and be part of Configuration.h

ejtagle · 2018-06-26T06:28:08Z

@p3p, @thinkyhead : I think i finally found the problem of the sudden acceleration. This comes from a long time ago. It´s the way we calculate max_entry_speed_sqr of each block, inherently associated to the way Jerk values are calculated.

The problem is that Jerk (either the "classic" approach or the new approach assume that we always can chain blocks with the previous one.

And that is simply not true! - If the previous block is busy or non existant, it can´t be chained to this one - There is a condition moves_queued, but that is not enough, as when there are 2 movements, and the first movement is BUSY, and the 2nd is queued, then it is impossible to merge movements... I am thinking on how to solve this problem... It´s a new race condition between the Jerk computations and the stepper ISR...

And, mostly probably, forcing a planner.sync() between each layer could trigger the "sudden acceleration" bug and easily cause layer shifts...

iosonopersia · 2018-06-26T07:46:03Z

@Sineos may I suggest you a driver selection system just like the board-pins one? A file for each type of driver (it may turn out to be useful if in future we'll add other driver-dependent options in the firmware, to keep them more organised); then a Drivers.h file with a list of macros which points to those files... Let's say something like this:

Configuration.h

//Select from Drivers.h the slowest driver you have installed on your board
#define SLOWEST_DRIVER_INSTALLED  DRIVER_DRV8825

Drivers.h

#define DRIVER_DRV8825  ./drivers/driver_DRV8825.h
#define DRIVER_A4988  ./drivers/driver_A4988.h
#define DRIVER_TMC2130  ./drivers/driver_TMC2130
[...]

driver_DRV8825.h

//Every driver-specific options
#define MINIMUM_STEP_PULSE blabla
#define blabla blabla

Then we can load the driver's options like this:
#include SLOWEST_DRIVER_INSTALLED

Wouldn't this bring to a much cleaner code, or am I over-complicating? Take this just as a little suggestion. Anyway, I feel like we should definitely talk about this in another issue...

iosonopersia · 2018-06-26T11:01:38Z

@tig33r I opened a FR here: #11119

thinkyhead · 2018-06-26T20:13:50Z

the "classic" approach or the new approach assume that we always can chain blocks with the previous one

That should have been ensured in the past, for non-busy blocks at least. Except for BUSY blocks, which other blocks can/should not be allowed to have their acceleration/deceleration updated?

thinkyhead · 2018-06-26T20:21:39Z

I would like to better understand the endstops changes here. What we ultimately want is:

If endstop filtering is enabled, keep checking endstops/probe status so that a validated endstops/probe state is available immediately when starting an endstop or probe move.
Don't constantly check endstops/probe if endstop filtering is disabled.
When endstop filtering is disabled trust the endstops/probe states on a single read of the pins.
Don't bother marking any endstop as "triggered" (for the purpose of announcing the hit or interrupting the move) unless its associated axis is moving towards the endstop.

It looks like this PR might be removing that last item.

ejtagle · 2018-06-26T21:50:56Z

@thinkyhead : Lets go item by item. All those are hard to follow and understand.

First, for the "sudden acceleration" issues: I did isolate and take out the motion planner and compiled a separate project with it, to be able to understand how it works - And i used the logs i attached to reproduce the problem.

It goes more or less like this:
-It must be noted that the motion queue tail points either to the busy block (if the stepper has already taken ownership of it) or points to the first movement that must be executed.

If you see from the Planner point of view, if the block pointed by the tail is BUSY (owned by Stepper), then that block must be considered readonly. If the block is not owned yet, then the block can be modified by the Planner.

So, the first problem is in the Planner::_populate_block() method, that uses a call to movesplanned() to determine if the block->max_entry_speed_sqr is 0 (when no movements were queued) or a different value calculated using either JUNCTION_DEVIATION or the previous jerk algorithm

But using movesplanned() != 0 as a condition assumes that either no block was executing (so it is correct to limit the maximum junction speed (with a non existant block) to 0 (as if there wasn´t a previous block, then the motors are stopped, as the planner ensures the last queued block always ends in 0 speed)

But now, if movesplanned()>0, then the code assumes the maximum allowable junction speed can be determined by the JUNCTION_DEVIATION or the previous algorithm, and that IS IGNORING the fact that maybe the tail block is BUSY, thus its exit junction speed is READ ONLY and can´t be altered.

So, what happens next is that the planner is allowed to replan this new block freely, specifically its entry speed, and that is simply not correct at all, as the previous block (the BUSY one) has a very defined exit speed that can't be altered

To add to the difficulty of handling this condition, the tail block could become BUSY while the planner is running. I have implemented a race protection prevention but, nothing is enforcing the following block entry speed to respect the exit speed of the BUSY block.

And the BUSY block can change as we plan! ...

How did it work previously ? : It was also buggy, i am afraid. Previously, the forward pass was ignoring the first 2 blocks (the tail and the next to the tail one) - And that is completely incorrect, as it was shown that due to that, planning of blocks was also erratic and caused acceleration spikes and jerk was not respected (i could try to find the thread... @Sebastianv650 did all the investigation)

So, the answer seems to enforce entry speed to the exit speed of the previous (closer to be executed) block if that block becomes busy. That is what i am working on right now.

And also make sure Planner::_populate_block() enforces the entry speed of a block to be the exit speed of the busy block, is there is just one queued block and it happens to be the busy block

Fortunately, when more than 2 blocks are queued, none of this logic is required, that its why it "seems" to work in mostly all cases, but not all of them.

ejtagle · 2018-06-26T22:02:58Z

@thinkyhead : Regarding the endstops problems, that is why i wanted your review. Basically, if you closely look at the modifications i did to the endstops code, in the Endstops::update() function, everything above

#define TEST_ENDSTOP(ENDSTOP) (TEST(state(), ENDSTOP))

Should only be updating the live_state variable, that then is fed to the ENDSTOP_NOISE_FILTER algorithm.

All the updates to the hit_state variable and the calls to planner.endstop_triggered() functions are kept and done with the exact same conditions as before.

The periodic update of live_state is needed for the filtering algorithm to work, because, previously, as soon as motors stopped, live_state was not updated anymore, and the filter would eventually accept the value!

But, obviously, something is escaping me.. Probably with CORE printers, as @smoki3 reports it breaks sensorless homing even with ENDSTOP_NOISE_FILTER disabled, but there was a report on this PR fixing G28/G29/G30 (but i do suspect on a nonCore nonDelta printer)...

p3p · 2018-06-26T22:20:05Z

@ejtagle edit: ignore that, I misinterpreted the data, always remember when you add a filter or it can confuse you, actual data shows its the jerk setting and the frequency interplay noise.

ejtagle · 2018-06-26T22:57:23Z

@p3p : I could be wrong, but i do suspect the acceleration spikes you are seeing are caused by the extra processing time the stepper ISR requires to setup a new movement block and start executing it. The ISR has a loop that will recover the used time (catch up) by accelerated execution of the ISR body.
I have to admit that i did not expect that to happen on a 32bit board...
It would be interesting to measure ISR execution times at block change

thinkyhead · 2018-06-26T23:12:17Z

Some speculative questions based on your notes…

…using movesplanned() != 0 as a condition assumes that either no block was executing…

Would it be useful to have a moves_waiting function that checks if the first block is marked BUSY and returns one less than the movesplanned value in that case?

…the tail block could become BUSY while the planner is running…

Did we previously use a CRITICAL_SECTION to prevent that?

…the answer seems to [be to] enforce entry speed to the exit speed of the previous (closer to [being] executed) block if that block becomes busy…

I was under the assumption that every block's entry speed is automatically matched up to the exit speed of the block ahead of it, and we could almost always count on having at least two blocks to work with (unless the planner gets starved, in which case we're obligated to permit blocks to decelerate to the minimum jerk speed). How did that get broken?

thinkyhead · 2018-06-26T23:19:46Z

GitHub formatting tip:

Only a single backtick should be used for inline monospace formatting.

Three backticks with optional file-format specifier for multi-line comments…

int i = 10;

See https://guides.github.com/features/mastering-markdown/ for more markdown help.

thinkyhead · 2018-06-26T23:23:39Z

Regarding the endstops problems, that is why i wanted your review.

A separate PR would be better. So that they can be evaluated separately and the first one that's ready can be merged while the other may still be under development.

ejtagle · 2018-06-26T23:25:39Z

@thinkyhead
To your first question, yes. I am trying a new approach that restricts the blocks the planner touchs, removing the busy block, and, seems to be working...

There was never a critical section protecting the planner. It is just too much time... At some point, when the first movement was splitted in 2 parts and then queued, i seem to recall all the planner running with interrupts disabled - As said previously, this partly fixes it, but does not handle all possibilities

Your last assumtion still holds exactly as you state it. almost always : Seems Octoprint managed to trigger that ... ;)

Completely agree into splitting this into 2 PRs, one for the planner, the other for the Endstops

thinkyhead · 2018-06-26T23:31:15Z

The periodic update of live_state is needed for the filtering algorithm to work

Yes, and I believe that I already fixed that last week. When endstop filtering is enabled it is periodically updated from the temperature ISR.

All the updates to the hit_state variable and the calls to planner.endstop_triggered() functions are kept and done with the exact same conditions as before.

Are they? I just got back from ERRF and have over 100 issues in the queue at the moment, so I haven't studied the changes closely, but I will look closer when the fog clears.

…it breaks sensorless homing even with ENDSTOP_NOISE_FILTER disabled…

Yep. This code is brittle, so it's a bit reckless for us to keep uprooting it.

ejtagle · 2018-06-27T03:32:22Z

Well, seems to be fixed. Octoprint moves smoothly, and I did a full print and no issues or strange noises at all... 👍

thinkyhead · 2018-06-27T03:40:03Z

This commit message is too long and uses the wrong tense. Grr…

Making sure the planner does not update blocks whose entry speed can't be altered. Also made sure to start from 0 entry speed if no planned blocks

Fixing, squashing, and rebasing.

thinkyhead · 2018-06-27T03:42:48Z

Well, seems to be fixed.

Good to know. I will shortly break out the endstop changes into a separate PR.

ejtagle · 2018-06-27T03:50:55Z

Now i realise you are complaining about the commit messages, not the PR message.... 👍

thinkyhead · 2018-06-27T04:01:21Z

Now i realise you are complaining about the commit messages

Long commit messages get cut off in git tool interfaces. The long description is the appropriate place for the longer description, and git tools will show the whole thing, even including Markdown formatting and links.

ejtagle · 2018-06-27T04:24:51Z

Let's hope this one fixes #10446 ... finally ! 👍

- Allow planner to alter the deceleration phase of the currently executing block. - Remove BUSY flag, as it is NON ATOMIC to set bits in the Stepper ISR and Planner at the same time.

Sickeroni · 2018-07-23T16:08:08Z

Can somehow explain me the fix in the first place? i didn't checked every change.
i just read commit https://github.com/MarlinFirmware/Marlin/tree/edb21f349ad18d2948ff6c313c6d43132bad5118
specially file planner.h
https://github.com/MarlinFirmware/Marlin/blob/edb21f349ad18d2948ff6c313c6d43132bad5118/Marlin/src/module/planner.h

variable flag of struct block_t ist volatile uint8_t and comment says its modified by ISR and main.
in planner.cpp line 867 i see that state changes to recalculate (set one specific bit)
then there is a check if the isr set the busy flag.

Isn't there still a race condition or am I just overlooking something?

possible fail on my side in my theoretical model:
sbi is a macro from macros.h thats SBI(n,b) (n |= _BV(b))
and marlin/fast.io #define _BV(PIN) (1UL << PIN)
other words #define SBI(n,b) (n |= (1 << b))

read n, 
bitwise-or a specific value (shifted 1)
write to n.

        SBI(current->flag, BLOCK_BIT_RECALCULATE);

        // But there is an inherent race condition here, as the block may have
        // become BUSY just before being marked RECALCULATE, so check for that!
        if (stepper.is_block_busy(current)) {
          // Block became busy. Clear the RECALCULATE flag (no point in
          // recalculating BUSY blocks). And don't set its speed, as it can't
          // be updated at this time.
          CBI(current->flag, BLOCK_BIT_RECALCULATE);

what could (in my optionion) happen in assembler

== main doing SBI with ==
main: reads flag from ram
main: bitwise-or our recalculationbit
== a wild interrupt appears ==
ISR: reads flag from ram (OLD FLAG-variable!)
ISR: bitwise-or busyflag to it
ISR: writes flag to ram(now with busyflag)
== wild interrupt fleed == 
main: write HIS modified flag to ram, overwrites ISR-flag-version

what am i overlooking here?
i would just disable interrupts for this specific block to be 100% sure.

thinkyhead · 2018-08-22T22:28:45Z

Isn't there still a race condition or am I just overlooking something?

@Sickeroni, @ejtagle — Can I assume we resolved this?

ejtagle · 2018-08-22T23:11:29Z

@thinkyhead : Yes, we resolved it
@Sickeroni : As you point out, SBI/CBI are NOT atomic operations. They are performed with a sequence of READ-MODIFY-WRITE operations,
If you look carefully at the implementation of stepper.is_block_busy(current), you will notice THERE IS NO BUSY BIT AT ALL.
I removed even the definition of the BUSY bit, to be absolutely sure NO ONE would attempt to revive/reuse it.

CBI/SBI are "safe" to use in block->flags, because from the ISR perspective, flags is READ ONLY. So, under such condition, the ISR will eventually detect if a bit has changed, but will never overwrite any bits. The only thread that can modify bits is the main thread.

To determine if a block is "BUSY", what we do is to verify if the current block is the last (the first to be removed) block in the planner block queue. This is safe, as that index into the queue is only altered by the stepper ISR itself.

Disabling interrupts adds extra latency to the already critical stepper ISR, it is a NO NO, and a lot of pain went into removing those unneeded critical sections

gloomyandy mentioned this pull request Jun 24, 2018

[bugfix-2.0.x] Bad forward extruder moves between retractions #11093

Closed

thinkyhead force-pushed the bugfix-2.0.x branch from 9035e3d to 5b49fcc Compare June 26, 2018 18:29

thinkyhead added PR: Improvement PR: Bug Fix labels Jun 26, 2018

thinkyhead mentioned this pull request Jun 27, 2018

[2.0.x] Poll all endstops, even when stationary #11123

Merged

thinkyhead changed the title ~~[2.0.x] Misc fixes to endstops and Planner and Stepper~~ [2.0.x] Misc fixes to Planner and Stepper Jun 27, 2018

thinkyhead mentioned this pull request Jun 27, 2018

[1.1.x] Fix stepper/planner block handling, race conditions #11136

Merged

Fix stepper/planner block handling, race conditions

ffcbdd7

- Allow planner to alter the deceleration phase of the currently executing block. - Remove BUSY flag, as it is NON ATOMIC to set bits in the Stepper ISR and Planner at the same time.

thinkyhead merged commit edb21f3 into MarlinFirmware:bugfix-2.0.x Jun 27, 2018

kAdonis mentioned this pull request Jul 1, 2018

[BUG] Ramps/Re-Arm losing steps with DRV8825 (layer shifts) #11047

Closed

[2.0.x] Misc fixes to Planner and Stepper #11098

[2.0.x] Misc fixes to Planner and Stepper #11098

Conversation

ejtagle commented Jun 23, 2018 • edited by thinkyhead Loading

smoki3 commented Jun 25, 2018 • edited Loading

nebbian commented Jun 25, 2018

p3p commented Jun 25, 2018 • edited Loading

thinkyhead commented Jun 25, 2018

thinkyhead commented Jun 25, 2018

Sineos commented Jun 25, 2018 • edited by thinkyhead Loading

ejtagle commented Jun 25, 2018

smoki3 commented Jun 25, 2018

Sineos commented Jun 25, 2018

ejtagle commented Jun 25, 2018

ejtagle commented Jun 25, 2018

smoki3 commented Jun 25, 2018

ejtagle commented Jun 25, 2018

ejtagle commented Jun 25, 2018

ejtagle commented Jun 25, 2018

teemuatlut commented Jun 25, 2018

p3p commented Jun 25, 2018

ejtagle commented Jun 25, 2018

p3p commented Jun 25, 2018

ejtagle commented Jun 25, 2018 • edited Loading

p3p commented Jun 25, 2018 • edited Loading

ejtagle commented Jun 26, 2018

Sineos commented Jun 26, 2018

ejtagle commented Jun 26, 2018 • edited Loading

iosonopersia commented Jun 26, 2018 • edited Loading

iosonopersia commented Jun 26, 2018

thinkyhead commented Jun 26, 2018

thinkyhead commented Jun 26, 2018 • edited Loading

ejtagle commented Jun 26, 2018

ejtagle commented Jun 26, 2018 • edited by thinkyhead Loading

p3p commented Jun 26, 2018 • edited Loading

ejtagle commented Jun 26, 2018 • edited Loading

thinkyhead commented Jun 26, 2018

thinkyhead commented Jun 26, 2018 • edited Loading

thinkyhead commented Jun 26, 2018

ejtagle commented Jun 26, 2018 • edited Loading

thinkyhead commented Jun 26, 2018 • edited Loading

ejtagle commented Jun 27, 2018

thinkyhead commented Jun 27, 2018

thinkyhead commented Jun 27, 2018

ejtagle commented Jun 27, 2018

thinkyhead commented Jun 27, 2018

ejtagle commented Jun 27, 2018

Sickeroni commented Jul 23, 2018 • edited by thinkyhead Loading

thinkyhead commented Aug 22, 2018

ejtagle commented Aug 22, 2018 • edited by thinkyhead Loading

ejtagle commented Jun 23, 2018 •

edited by thinkyhead

Loading

smoki3 commented Jun 25, 2018 •

edited

Loading

p3p commented Jun 25, 2018 •

edited

Loading

Sineos commented Jun 25, 2018 •

edited by thinkyhead

Loading

ejtagle commented Jun 25, 2018 •

edited

Loading

p3p commented Jun 25, 2018 •

edited

Loading

ejtagle commented Jun 26, 2018 •

edited

Loading

iosonopersia commented Jun 26, 2018 •

edited

Loading

thinkyhead commented Jun 26, 2018 •

edited

Loading

ejtagle commented Jun 26, 2018 •

edited by thinkyhead

Loading

p3p commented Jun 26, 2018 •

edited

Loading

ejtagle commented Jun 26, 2018 •

edited

Loading

thinkyhead commented Jun 26, 2018 •

edited

Loading

ejtagle commented Jun 26, 2018 •

edited

Loading

thinkyhead commented Jun 26, 2018 •

edited

Loading

Sickeroni commented Jul 23, 2018 •

edited by thinkyhead

Loading

ejtagle commented Aug 22, 2018 •

edited by thinkyhead

Loading