-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Expo] Direct stepper chunk support #7012
Conversation
It was missing MSG_FILAMENT_CHANGE_HEAT_2 and MSG_FILAMENT_CHANGE_HEATING_2
fixed spanish lang not compiling w filament change
Hmm, I forgot to add literally any description of the chunk format... The current chunk format is 256 bytes per chunk. Every 2 bytes describes 8 steps for 4 axis (XXXXYYYY ZZZZEEEE), letting each axis describe a step move delta that is +/- 7 steps (we have to sacrifice a step here because we cant actually represent +/- 8 using 4 bits). Subtract 7 from each integer 'nibble' to get the step delta. When processing the chunk, each nibble is looked up in a small step table to get the step pattern for that delta (without using bresenham counters). Unfortunately, direction needs to be checked every 8 steps, as it could change. |
I think there are a lot of problems with this format. e.g. How do you
handle comms errors? The format is specific and not extensible, i.e. it is
tailored for a printer with 1 extruder. What about multiple extruders? What
about non-printers, which might use XYZR?
It doesn't seem to make great use of bandwidth either, 25-50% of the data
will likely be always zero. I've also seen other people doing similar thing
but with different encoding, so then the question is which encoding to
support? Or do we support multiple encodings?
…On 10 June 2017 at 14:26, Colin Godsey ***@***.***> wrote:
Hmm, I forgot to add literally any description of the chunk format...
The current chunk format is 256 bytes per chunk. Every 2 bytes describes 8
steps for 4 axis (XXXXYYYY ZZZZEEEE), letting each axis describe a step
move delta that is +/- 7 steps (we have to sacrifice a step here because we
cant actually represent +/- 8 using 4 bits). Subtract 7 from each integer
'nibble' to get the step delta. When processing the chunk, each nibble is
looked up in a small step table to get the step pattern for that delta
(without using bresenham counters). Unfortunately, direction needs to be
checked every 8 steps, as it could change.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#7012 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA7VR-pns-mvYwhjzUlY_-G8qPL46fdbks5sCpmHgaJpZM4N19Ln>
.
|
@bobc So comms errors are actually handed correctly, each chunk is sent with a checksum, and will eventually respond with "!ok X", "!fail X", "!busy". Pretty similar to the gcode pipeline. The responses are different so you can effectively multiplex the command streams (chunks and gcode). The response pipeline also lets you "batch" send as many chunks as you want, even before you get a response. This is the key to getting the full available bandwidth, I just have not implemented that in step-daemon yet because its more complex than "one at a time" (due to error handling). Thats where the 50% bandwidth comes into play, otherwise if you did it better you could achieve close to the line rate. Plus currently, if you wanted to run your chunks at 50k steps/s (impossible in marlin right now), you'd only need an effective bitrate of about 100kbps (easily covered with a 250kbps line rate). As far as when the "chunk buffers" are full in marlin, it starts responding with "!busy", in which case the external planner (step-daemon in my case) will take a calculated pause. So it does definitely waste some bandwidth here and there, but only when the chunk buffers are full up. So, the encoding is definitely fixed at 4 stepper motors. It looks like Marlin is really set up to handle 4 steppers, except with MIXING_EXTRUDER enabled. So I think that covers the majority, but for mixing extruders, we'll probably need another format. Which brings me to formats. I think ideally there should be some standards for formats. Think of bitmaps: bitmaps have a pretty set format (usually just a flat buffer with idx = x + y * width), but there's several actual pixel encodings available that effect how much information can be stored in each 'pixel', without really effecting the format. So basically, an expandable format with different 'pixel' encodings would probably be best. As far as the "wave" tables (step table) I have, and the 8-bit resolution, that is a compromise that may be different for certain device situations. For this use case, im assuming 8 or 16 division microstepping. The actual step curve for microstepping can be rather odd, so it seemed like a great place to hide the "8 step lines" that the chunks boil down to. The lines should be multiple steps, as otherwise we waste cycles and bandwidth by having to encode and check more direction data (a single 1024 step chunk can change directions 128 times max per axis). EDIT: and at 15k steps/sec, there's really no noticeable harmonic noise. The wave tables introduce a tiny bit of noise, but I wouldn't say its worse than the noise produced with plain marlin. Just sounds a little different. I initially had marlin introducing some entropy into the wave table positioning, but it made a rather displeasing white noise. The wave tables I have each produce a discrete, but not horrible, tone. I think the bresenham line algo basically produces other similar "coherent" noise. EDIT2: Also, this format makes extensive use of 8-bit processing for performance sake. All the math involved in "playing" the chunks is based on 8-bit bitwise operations. |
I think you could easily make the format extensible, e.g. have a chunk
header that specifies axis letter and perhaps bit size per axis. A chunk
header would be a good idea anyway to easily allow different encodings to
be added. Alternatively, a "format spec" packet could be sent to configure
setup for following chunks.
For an example of other firmware doing this sort of thing, have a look at
Klipper https://github.com/KevinOConnor/klipper/blob/master/docs/Overview.md
…On 10 June 2017 at 16:06, Colin Godsey ***@***.***> wrote:
@bobc <https://github.com/bobc> So comms errors are actually handed
correctly, each chunk is sent with a checksum, and will eventually respond
with "!ok", "!fail X", "!busy". Pretty similar to the gcode pipeline. The
responses are different so you can effectively multiplex the command
streams (chunks and gcode).
The response pipeline also lets you "batch" send as many chunks as you
want, even before you get a response. This is they key to getting the full
available bandwidth, I just have not implemented that in step-daemon yet
because its more complex than "one at a time". Thats where the 50%
bandwidth comes into play, otherwise if you did it better you could achieve
close to the line rate. Plus currently, if you wanted to run your chunks at
50k steps/s (impossible in marlin right now), you'd only need an effective
bitrate of about 100kbps.
As far as when the "chunk buffers" are full in marlin, it starts
responding with "!busy", in which case the external planner (step-daemon in
my case) will take a calculated pause. So it does definitely waste some
bandwidth here and there, but only when the chunk buffers are full up.
So, the encoding is definitely fixed at 4 stepper motors. It looks like
Marlin is really set up to handle 4 steppers, except with MIXING_EXTRUDER
enabled. So I think that covers the majority, but for mixing extruders,
we'll probably need another format.
Which brings me to formats. I think ideally there should be some standards
for formats. Think of bitmaps: bitmaps have a pretty set format (usually
just a flat buffer with idx = x + y * width), but there's several actual
pixel encodings available that effect how much information can be stored in
each 'pixel', without really effecting the format. So basically, an
expandable format with different 'pixel' encodings would probably be best.
As far as the "wave" tables (step table) I have, and the 8-bit resolution,
that is a compromise that may be different for certain device situations.
For this use case, im assuming 8 or 16 division microstepping. The actual
step curve for microstepping can be rather odd, so it seemed like a great
place to hide the "8 step lines" that they chunks boil down. The lines
should be multiple steps, as otherwise we waste cycles and bandwidth by
having to encoding and check more direction data (a single 1024 step chunk
can change directions 128 times max per axis).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7012 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA7VR8SwJsfOgXBnXptaB3v_YRJaUynxks5sCrDwgaJpZM4N19Ln>
.
|
Test video showing some of the motion, and you can hear the sounds. Definitely a bit different from Marlin, but not loud at all. Video was done at 15k steps/s for the chunks (20 and 30 are even better). My squeaky z-axis during z-lift is the loudest part. That's my general noise benchmark so far ;) @bobc yea that's definitely a good idea. I think maybe some fixed (standard) formats would be good tho, just so you can provide optimized pipelines. Or at least fixed standards in marlin itself, on 8-bit boards you're already so restricted for processing time and space (RAM and application memory), you could maybe only have 1 or 2 encodings enabled at a time, and some of them will be more optimal (and possible) on that particular board. This klipper project is interesting, didn't know about that. I initially thought of trying to do it that way, but the idea of writing your own firmware is incredibly scary lol. I made it a hard goal for this project that: my printer wasnt really more likely to catch on fire than running just Marlin. The klipper people are brave for attempting that. Besides, the implementation I needed really fit just fine with the normal G-code pipeline. The external planner basically only does 2 things: a) controls the g-code pipeline and monitors when it needs to sync positions with the pinter, like after homing b) takes large segments of G0 and G1 commands and uses the chunk pipeline to ultimately turn them into C0 commands (backed by chunk buffers). Turning them into C0 commands ultimately requires the planning and stepping logic. EDIT: Even the auto bed leveling is still done by marlin, the external planner just scrapes the probe info from serial, and feeds it into its own algos in the case of step daemon (bicubic for the win!) |
There is also Pacemaker https://github.com/JustAnother1/Pacemaker and some
other interesting approaches https://github.com/JustAnother1/Pacemaker such
as spline curves, which would help with arcs and curves generally.
Although, tbh, not really sure what the advantages of this method are.
Overall step rate does not seem to be higher, and it just ends up using
more comms. It also ties the controller to a host, although if you pair a
host such as Rpi or EPS8266 with an MCU it makes more sense.
…On 10 June 2017 at 16:48, Colin Godsey ***@***.***> wrote:
Test video showing some of the motion, and you can hear the sounds.
Definitely a bit different from Marlin, but not loud at all. Video was done
at 15k steps/s for the chunks (20 and 30 are even better). My squeaky
z-axis during z-lift is the loudest part. That's my general noise benchmark
so far ;)
https://youtu.be/lDJzmHbLk6M
@bobc <https://github.com/bobc> yea that's definitely a good idea. I
think maybe some fixed (standard) formats would be good tho, just so you
can provide optimized pipelines. Or at least fixed standards in marlin
itself, on 8-bit boards you're already so restricted for processing time
and space (RAM and application memory), you could maybe only have 1 or 2
encodings enabled at a time, and some of them will be more optimal (and
possible) on that particular board.
This klipper project is interesting, didn't know about that. I initially
thought of trying to do it that way, but the idea of writing your own
firmware is incredibly scary lol. I made it a hard goal for this project
that- my printer wasnt really more likely to catch on fire than running
just Marlin. The klipper people are brave for attempting that. Besides, the
implementation I needed really fit just fine with the normal G-code
pipeline. The planner basically only does 2 things: a) controls the g-code
pipeline and monitors when it needs to sync positions with the pinter, like
after homing b) takes large segments of G0 and G1 commands and uses the
chunk pipeline to ultimately turn them into C0 commands (backed by chunk
buffers).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7012 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA7VR8_JW7480LZAwfXzZpQNxm98MjIfks5sCrrVgaJpZM4N19Ln>
.
|
Marlin/stepper.cpp
Outdated
const uint8_t dE = b & 0xF; | ||
|
||
uint8_t steps[4] = { 0 }; | ||
steps[X_AXIS] = block_moves[dX][(block_steps + 0) & 0x7]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 0,2,4,6 offsets here are to just add a permanent offset to the wave tables for each axis. Probably not needed and just wasting cycles, but the theory there was to prevent noise. If all axis are stepping at the same rate, we want the pulses to stagger a bit. But the chances of them all stepping at the same rate is probably rare.
Marlin/stepper.cpp
Outdated
|
||
#define UPDATE_DIR(AXIS) \ | ||
if(d## AXIS == 0) {} \ | ||
else if(d## AXIS < 7) SBI(dm, AXIS ##_AXIS); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wish there was a faster way to do this
// Stop an active pulse, reset the Bresenham counter, update the position | ||
#define PULSE_STOPC(AXIS) \ | ||
if (steps[_AXIS(AXIS)]) { \ | ||
count_position[_AXIS(AXIS)] += count_direction[_AXIS(AXIS)]; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in theory this should let us resume marlin based planning after doing chunks, provided the planner syncs the "real" position back to it.
Marlin/chunk_support.cpp
Outdated
#define BLINK_LED LED_BUILTIN | ||
|
||
//wave tables, 4 bit move, +/- 7 | ||
uint8_t block_moves[16][8] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wave tables for each step frequency, done by hand. this is my third version for this encoding, it seems to be the least annoying noise wise.
@bobc the main advantage here is that allows you to get 32-bit performance (and more), out of hardware you probably already have: an 8-bit or above Marlin compatible board, and a raspberry pi. The step-daemon project is targeted specifically for the RPi, and will offer a virtual serial interface letting you use octoprint with it. Basically, you can use hardware you already have on hand, and allow your device to handle kinematics and other advanced features past what you normally be able to do with Marlin alone on that board. All that, plus hopefully allowing Marlin to go at a faster steprate because it has to nothing to do other than: handle the serial ISR, temp ISR, and stepper ISR. No planning, no real math, no bed leveling, no line counters, no lin advanced, literally has to just handle the realtime components. And all this possible with a daemon that lives on your RPi that takes maybe 10-20% CPU. The step daemon software uses full 64-bit precision floating point, and real vector math and linear algebra to solve planning problems. The rpi has really impressive floating point support, even for 64-bit precision. An example I gave out earlier: the math required for processing at 20k steps/sec is equivalent to simulating a 500-particle 3d system at 60fps. Which is pretty simple, the rpi can run minecraft, for example ;) |
the main advantage here is that allows you to get 32-bit performance (and
more), out of hardware you probably already have. 30 kHz is no faster than
current marlin.
Sorry, that just sounds like marketing nonsense.. one can easily get 100kHz
out of a cheap 32 bit CPU. The limitations on CPU performance are mostly
the comms and the stepper ISR, which you still have. An advantage is that
the firmware is simpler, but since we are already starting with Marlin that
is not relevant.
I'm afraid to say this seems to be another feature implemented "just
because we can". You haven't demonstrated any real need or advantages. I
kind of suspect you would be the only user of it.
…On 10 June 2017 at 17:57, Colin Godsey ***@***.***> wrote:
@bobc <https://github.com/bobc> the main advantage here is that allows
you to get 32-bit performance (and more), out of hardware you probably
already have: an 8-bit or above Marlin compatible board, and a raspberry
pi. The step-daemon project is targeted specifically for the RPi, and will
offer a virtual serial interface letting you use octoprint with it.
Basically, you can use hardware you already have on hand, and allow your
device to handle kinematics and other advanced features past what you
normally be able to do with Marlin alone on that board. All that, plus
hopefully allowing Marlin to go at a faster steprate because it has to
nothing to do other than: handle the serial ISR, temp ISR, and stepper ISR.
No planning, no real math, no bed leveling, no line counters, no lin
advanced, literally has to just handle the realtime components. And all
this possible with a daemon that lives on your RPi that takes maybe 10-20%
CPU.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7012 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA7VR1ykLqmwLiNS4YO3iA_hWpcbJrXOks5sCsr7gaJpZM4N19Ln>
.
|
I find it impossible to believe that the CPU limitations are based solely on the ISR. You have 32-bit floating point math, long math, int math, tons of things happening routinely that really have no business being on an 8-bit processor. Hell, there's even long divides. This solution is virtually no different than what smoothie or the replicape do, they have an advanced processor that produces sequences for the real-time low-bit cores. Where they have the advantage of DMA and similar, we have to suffer through serial transfer. But I have already proven that 250kbps of raw transfer (through the chunk pipeline) sacrifices no more than 15% available CPU time on an atmega2560 (16 Mhz), so theres really nothing stopping this from reaching higher step rates. 100kbps effective serial rate (which would be needed for 50k steps/s), means 12.5k ISRs a second with hardware serial. That's virtually nothing. I'm sorry you can't see the value in this, and unfortunately this means ill have to produce more "marketing nonsense" to showcase its worth. But the fact of the matter is, we're generally trying to run something as complex as "doom" on the equivalent of a calculator CPU, and that's where I feel the true nonsense is. You can be magnitudes more complex and exact with your calculations as soon as you offload this stuff. Just like slicing is a natural division of duties in 3d printing, I believe planning and stepping is too. Also, I'd like to note that the hard limit in marlin of 40k steps/second is based on stepper driver limitations. For core/MVP support, that's not a limit I really want to push. EDIT: also since submitting this, I've refined the chunk stepper routine to use about half the cycles it previously was. Real numbers soon EDIT2: Erm, and sorry, a TI-86 is a 6Mhz. So I guess an atmega2560 would be a bit less than 3 of those, but with far more RAM |
I'm very interested to look more closely at the code and see what's going on, and I will do so now… I've rebased your branch onto the current To replace your branch with the contents of mine:
You could then use your branch to make a new PR targeted at |
@thinkyhead great! I'll probably rebase (and new PR) rather soon, I have some pending changes that should help with performance, but I wanted to push this through as its basically what I used to for my last test. Once i get the new changes tested, ill merge and rebase. Also going to add a note on one line of the PR here (for the temp ISR), I apparently did push that line that wasn't tested, and I don't really know what the effects are currently. |
Marlin/stepper.cpp
Outdated
//sync planner position back up with stepper positions | ||
Planner::sync_from_steppers(); | ||
//temperature ISR can get drowned out under high step rate, make sure it gets run | ||
Temperature::isr(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!! this line has not been tested !!. It snuck into my PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably not 100% safe. We've already done some tweaking to leave space for the serial UART, and have improved the reliability of Temperature::isr
generally, but if the stepper ISR is run at too high a rate, the temperature ISR does get missed. There is a flag indicating that the temperature ISR is active, so if the stepper ISR is interrupting it then that can be checked.
At that point you'll need to close this PR and open a new one, because you can't re-target a PR to a different branch. |
@thinkyhead awesome, yea this PR will be a throwaway. Completely conceptual (but proven) right now, just for the proof of concept and RFC. Next one I post should be a functional example, this one is more to get some rather raw input. The code itself is stuff mostly garbage water, not really much thought into where I put globals and header fodder. Think i switched between snake and camel case a few times. Will probably do the same branch, suppose theres not much point to keep this one open after i move to the new one. |
If you can provide any hard data to sway the skeptics, that will go a long way towards justifying this concept. I haven't yet absorbed the full picture, but it has all the hallmarks of an optimization… |
I've also developed a nice proof-of-concept that uses two RAMPS boards, both running Marlin, to share resources by coordinating via i2c. This is another easy way to take advantage of inexpensive hardware. The second Marlin instance may be completely stripped down. In my proof-of-concept the "slave" board handles all Z movement for 4 independent Z stepper motors, while the "master" board does everything else. Other divisions of labor should easily be possible, so you might have one Marlin running the UI and sensors while the other one only handles motion. In any case, offloading computation across more inexpensive boards is a great way to go, and can help extend the life of 8-bit boards even as the 32-bit boards come down in price. |
@thinkyhead awesome! Yea, I think there's a great benefit to multiprocessing in general here, in almost any form. Even the external planner software (step-daemon) i wrote for this uses discrete pipeline processing for each stage of the handling, in this case to make sure the work is distributed properly for multi-process on a multi-core CPU. Marlin is just the last stage in the pipeline. Could conceptually be spread across devices too. More power is more power, as long as the serialization and transmission isn't more costly ;) Anyways, I just figured it would be awesome if i could cram my rpi into the pipeline, given how capable of a system is, and how common it is with 3d printing anyways. I agree on the PR. I wanted to get this in to "prime the pump" so to speak, next PR I intend to be complete, hopefully with a corresponding functional version of step-daemon so people can test. And yes, lots of numbers! |
Is this overall tech meant to be packaged as a plugin for OctoPrint — given that it's the de facto RPi printing host —? |
It could be. That touches on lots of usability issues, which is definitely something that I've been thinking about. I think it's perfectly reasonable that the plugin could be deployed as an octoprint plugin. For step-daemon, it's a java application, and raspbian should come with the oracle JVM available. On the other hand, step-d could benefit from real-time or high-nice priority (which is more easily done with a system level launcher). It should be able to delay up to about 10ms without really missing a beat. It can deal with java JIT and GC, so it should technically be happy with longish pauses that will happen in a more busy system. I ideally see it as something that would be deployed with octoprint tho. |
The idea of interfacing at the block level is good, as it keeps everything in the same destination queue. If there are unused variables from the normal stepper block, then they can be combined with the chunk data as a I haven't seen the portion of the code where it synchronizes back to the high-level
I know you're still working on this, but as I've been studying your code, I've also updated my branch with some elements that you should include in your update. Most important is making |
Great! Yea I'll probably combine any extra fields I need with a union. I'm thinking of simplifying the bitwise math a bit more just to use counters, so I'll need some extra fields somewhere, although maybe as statics. So far there's just the one extra 8-bit field, ill probably union that with one of the unused planner fields. The only position syncing done on the marlin side right now is syncing the stepper positions back to the planner after executing a chunk, no real automatic cartesian syncing on the marlin side, just because bed leveling and all the translation will be on the external planner. Not sure marlin has enough information to calculate the real cartesian point after running chunks. Currently I do have step-daemon syncing positions after detecting a few different g-codes in the pipeline (like after homing etc). Ill have to make sure it catches all the possible conditions in which it needs to sync, maybe something periodic too (right now the LCD just never updates its position during printing, which is somewhat annoying). But there are many g-code entry points into the coordinate systems which is great, and has made it really easy to do pure g-code solutions for most things, just watching, inserting and modifying the g-code pipeline as it goes by. I really appreciate the branch updates! I feel bad I didn't get those done in the first place. I'll make sure I get that merged and get a cleaner version done tonight (and probably a new PR, with the new basing). Ill try to get some better comments in there to. I think LIN_ADVANCE could be implemented to work with the chunk system. Something like, bed leveling for example, is basically impossible because the external planner takes on most of the cartesian handling role, but LIN_ADVANCE (from what I've seen) seems to be just stepper based, so I think it could work. I've actually look at it a bunch, but I don't really have a good mental picture of how the implementation is done.
But, that's step daemon. That project may in fact go nowhere at all lol I want to focus on it more as a proof of concept for this firmware extension. I think it's it worth it to consider LIN_ADVANCE compatibility with the extension, I'd like to preserve as much pure Marlin functionality with it as possible. TL;DR- No linear advance yet, but I think it should be done if possible. |
Marlin/stepper.cpp
Outdated
unsigned char dm = last_direction_bits; | ||
|
||
#define UPDATE_DIR(AXIS) \ | ||
if(d## AXIS == 0) {} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is plain wrong. Should be
d## AXIS == 7
Just so no bit is set or removed for direction. Index 7 is 0 steps (dX = index - 7). This would cause it to change the direction unnecessarily often.
Marlin/stepper.cpp
Outdated
steps[Z_AXIS] = block_moves[dZ][(block_steps + 4) & 0x7]; | ||
steps[E_AXIS] = block_moves[dE][(block_steps + 6) & 0x7]; | ||
|
||
//start of block, check direction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, so inside this routine, when i say 'block', I mean '8 step line'. I doubled up on the conflicted meaning here (with the marlin block_t). I'll probably rename all these vars to use 'segment' instead of 'block'.
moving over to #7047 |
moving over to #7047
This is an exposition PR intended to get feedback on a possible future addition to Marlin. I expect lots of input and there will be future changes, proper branch basing, etc. It should eventually just get closed out and not merged. (proof of concept?)
This feature adds a new G-code and serial extension that allows an external service to send direct step buffers to Marlin via USB serial. The intention here is to allow users that have the standard 8-bit control board and a more powerful external device to use the more powerful device to handle planning and step sequencing (the Raspberry Pi is the target hardware here).
This feature addition allows an external device to concurrently upload chunks of 1024 steps to the device, and trigger their sequencing by using a new G-code command (currently, 'C0'). The protocol for updating chunk buffers themselves is a binary protocol that starts a packet using the control character '!'. This character is not used elsewhere in g-code (input anyways), and allows low-level processing of the serial sequence- enabling buffering independent of the command parser (all handled in the ISR).
C0 command format:
C0 I[chunk start index] R[number of chunks, defaults to 1] S[steps per second]
The execution of the chunks is done by extending the Marlin block format with a field and flag that lets it execute the buffered chunk instead of looking for the normal trapezoid related parameters. The step speed is configurable, I've had success with 10k-30k steps/s, although 30k seems to starve the temperature ISR causing runaway errors.
The bandwidth/load limitations and whatnot were tested in advance, and the device seems capable of running healthy at 500kbps. 250kbps is probably also fine. My testing seemed to benchmark stable transfer bitrate at about half of the line bitrate (due to waiting for responses etc). This was with my test planner that doesn't implement the buffering pipeline as optimally as it could, otherwise effective bitrates could be closer to the max.
My external test planner: https://github.com/colinrgodsey/step-daemon
I finally got to a point where I could print a blazing fast 120mm/s benchy that seemed to provide the same dimensional accuracy to what I would normally get from Marlin (board is an MKS Base v1.4, Atmega 2560). So, figured it's time I start cramming things down "ye ol' open source pipeline".