-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Print Stall / Print Stop (no faults reported - watchdog resets) vrs 1.1.3 and ZRIB board #6992
Comments
One thing that would be helpful is try the same print from an SD-Memory card with the printer disconnected from anything else. If it still freezes we know the problem is in the printer's firmware and not a communication issue with the host computer. Also... I doubt you are running out of memory... But after the printer has been running and doing things for a little while, how much free memory does M100 find? |
In fact I printed and noticed this first on my SD card while printing the perpetual wheel mentioned ( numerous times :) |
The M100 log I received was included with log 3 and 4. I can't recall at the moment but it was not varying at all, every response had the same values. |
It wasn't clear to me which files were the log files. We want 2KB of free memory just because that will rule out the possibility of running out of memory.
I don't know what 'Fact I' is. But up until I posted a reply, the only mention of an SD Memory card was this:
Seriously... People are not going to read a "War and Peace" novel just so they can have the privilege of being rebuked when they try to help you. |
When you're getting the "start" message that means that the controller has gone through reset. If there are no error prints before the start message then most likely it's a watchdog timer timeout. Most of those are because the printer is overloaded and can't do the thermal management at the same time as moving the steppers. The only solutions I know of is to lower the feed rates, use less microsteps, or disable options like Linear Advance or UBL leveling. |
One easy way to get back a bunch of CPU cycles is to bump up the #define LCD_UPDATE_INTERVAL 100 number. The LCD panel will start being a little bit sluggish. But especially if you are not really using the LCD Panel, it doesn't matter. All those free cycles can go to doing real work. |
I totally came across incorrectly. I was stunned at the speed of your response and was so happy. My mistake was my wording. I am very interested in your input. I had never intended to ever come across as a rebuke. |
Excellent, I will set my LCD UPDATE INTERVAL and report back. |
It's funny you say the feed rates, I was kind of leaning to that after I realized I could get further into the print before stalling if I reduced my speeds by half. I have disabled the linear advance option as well. So I concur with your post. I will however, do as you say, and disable any UBL. I believe I am using BiLinear at the moment. |
Accepted...
It will make the LCD Panel a little bit 'sluggish'. But the LCD Panel will still work as expected and do everything you tell it to do. And if you are printing with a remote host... for how often you actually use the LCD Panel, it really doesn't matter. |
The LCD_UPDATE_INTERVAL was set to 300. Also set microstepping down to 2. The print stalled I took a chance and set the LCD_UPDATE_INTERVAL to 1000ms and it was the first time my print was successful on one of my most stall-challenged print. Thank you. What a relief. I still have a few questions regarding this problem. |
I can answer the M100 question. The answer is "No!" M100 is passive unless you ask it for information. M100 initializes the free memory area with 0xE5 values. When M100 is looking for corruption... It is looking for non-0xE5 values in the middle of the free area. And the 'free memory' area is simply the block of solid 0xE5 numbers. The stack will eat at that block from one direction. And the heap will eat at the block from the other direction. Right now, there should be no heap activity. Once you print a layer of your file, you can pause the print and do a M100 F If the free memory is greater than 2KB, you can turn the M100 code (option) off. Or, you can just edit your GCode file and add a M100 F when it does a layer change and watch your host to see what it says. |
Thank you for that insight to M100. I will continue to monitor by saving my OctoPi serial logs with M100 F responses. |
I'm actually getting a very similar issue with version 1.1.3 (i'm running from bugfix-1.1x from this afternoon) and the printer just hangs at some point during the print. I don't get any error message or warning, I can pause the print and send move commands in pronterface, I will try to change the LCD_UPDATE_INTERVAL as suggested by @Roxy-3D and let you know if that's better but second x-carriage that hang after a few hours in. Here is my config if it's of any use I'm running an normal RAMPS1.4 board with a custom panel (with an SSD1306 and a rotary encoder) |
This is a brute force way to lighten the load on the CPU. If we are running into trouble because the CPU is always busy... We need to find out why and fix it. If doing this brute force 'solution' lets you print, Great! But that doesn't mean the problem is fixed! |
Hi Roxy, yeah I get that, if that's the issue and this print work, I will roll back to my last working version and we can do a diff see what has changed between then and now. |
I have been cautiously optimistic. I completed this problematic print once. This is a huge step forward for me, thank you @Roxy-3D My second attempt at the same print failed. As discovered it always fails within 1-3 attempts. I thought I was going out of my mind because I couldn't nail it down to just one print, or even better yet the same location each time! I contacted @filipmu and he enlightened me to a possibility, namely "interupts". I am sure it has to do with the amount of things going on at once during the print. I am sure it is not the arduino based board I am using. My next attempt is to remove PID from my heat bed temp control, and removing my SD card support. Hopefully we can get to bottom of this. |
And just to be clear... I do think the interrupt code works. But if the CPU is too busy and an interrupt fires while the processor is in the (previous) interrupt code, the stack starts to wind up. If that happens again and again and again... Eventually the processor uses up all the free memory. And the firmware will crash. So if that is happening... We need to figure out why. But reducing how often the LCD Panel gets updated will free up cycles. |
@Roxy-3D your explanation makes complete sense. Would there be a way to either breakup each interrupt code using some type of allow-buffer-to-empty wait command? Or should I try to use M400 on every layer change after my M100 F? It might print and it may allow for some type of proof of concept? |
You could do that. That would be valuable information. I'm in the middle of trying to get the LCD Panel to do UBL Mesh Editing for 20x4 displays, so I'm kind of distracted. But what I would do is go add a few lines of code to every interrupt handler so it bumped a counter. And I would have it decrement the counter when it left the ISR. And if that counter ever was set when the interrupt handler started, we know we have recursion happening. And that would provide proof this is the problem. |
I am all ears. Let me know how I can help. Obviously lets keep you focused on your current task. I will do the M400 thing and report back. |
After three failed prints and changing the LCD_UPDATE_INTERVAL to 1000 to free some CPU cycles I finally managed to get a print complete. If i've some time tomorrow I will do what you suggest with the counters in the interrupts. |
During Layer Change added M400 - Print Stalls |
Busy printing yesterday, and had three stalls. This morning already one stalled on me. It was such as simple straight line job (about 40 minutes). That was with FWS on and LCD interval set to 1000ms I believe these stalls are from more than just my options being turned on. I have now disabled all the extras; FWS, Volumentric, Linear Advance, (excluding parking, nozzle cleaning and Bilinear bed leveling - they are still on). |
@simon-jouet if you want to share your debugging method, I would gladly perform the same changes on my end to see if we are getting similar results. |
@empakoso I haven't had time to look into this just yet, I did some maintenance on the printer yesterday and noticed some dodgy connections on my extruder thermistor (so might have been this). I will try to restore the LCD update interval to the default value and see if it still crashes, if so I will do what was suggested before |
That's really interesting, in the meantime I will check my electrical connections. Hopefully I find something like you did. Thanks. |
Okay I restored the LCD update interval and pulled the latest bugfix and same issue, it froze, I just paused the print, resumed and it kept going, next print I will check for recursive calls EDIT: It crashed 4 times during the print (at shorter intervals) again just pausing and resuming allowed to keep going |
Definitely paying attention. If you have watchdog enabled, your 'pause' and 'resume' maybe the same experience I had when I received a 'start' serial response. My print just started where it left off (it began by homing first which crashed into my partial print) |
I'm also paying close attention. You are not just talking to yourself. |
@Roxy-3D I completely understand. At the risk of sounding too eager, I was glad it wasn't just me. I appreciate your constant contact with this issue. I also understand that patience at this point is very important. Thank you. |
Okay so I've added a counter to the ISR and print if the counter doesn't have the excepted value and didn't get any error message in the console. If it's any interest below is the diff, and logs in pronterface when I pause/resume to keep going. This time it took quite a while ~2hours before the first crash then the second was moments after I resumed, it's still printing so might crash a few more times before it's done...
|
Did it 'crash' ? Or did it lock up? In my mind, those are different things. It sounds more like you are having lockups but if you pause, and continue, that bumps the logic enough things can recover. |
@Roxy-3D you're right it locks up, the board doesn't reset and the pause/resume allows it to recover. |
Are there timeout timers I could adjust to see if it makes a difference? |
I just reverted to 1.1.x I'm about to start a print that pretty much stall every time, let's see if that's the case this time EDIT: ah it stalled few seconds in, right after the skirt, next print I will revert to 1.1.2 |
I am at the ready with the following versions: With all the extras disabled, 1.1.3 doesn't stall as much. I only have M48, park, BiLinear mesh and nozzle clean enabled at the moment. I have left my FWS, fwRetract, and LinAdv disabled. |
I tried bugfix-1.1.x 1.1.3 1.1.2 1.1.1 and it stalled 1.1.0 bugfix used to work, I will keep going down in version like you, hopefully we end up figuring it out |
@empakoso don't know if you got yours working? After rolling back few version that should have worked I was still getting the issue, I tried changing the cable etc... was still the same but I think it was still an issue with the USB connection and not the board itself. I went for a much shorter cable with octopi and it's been working perfectly. Hopefully your problem is somehow similar. @Roxy-3D sorry for the troubles :) |
@simon-jouet I am very happy to hear you got it to work. I too thought this was an issue at first. I have already been using a short (45cm),shielded, and generally good quality USB cable between my rig and the host. In your case it might also help to set your baud to 115200 as o have done, this helps that issue as well. Sadly I am still experiencing the stall. I have made a somewhat breakthrough. I now know that with all my cool Marlin options enabled, which includes FWS as enabled, I can overcome my stall by simply replacing M405 with M406 in my startup code. It prints 100% every time, even after multiple tries or more than 3x's. This means that I disable the sensor and it prints. I will keep you posted. |
@simon-jouet did you have this stall issue when you print from the SD card w/o a host connected? Meaning w/o using a USB cable? |
@empakoso good to hear that you figured how to get it working, not optimal but at least you have a workaround for now. I tried only once with the SDcard and it stalled but my LCD panel is homemade and the connector between the RAMPS and the SDcard reader was far from optimal. When I have a bit more time I will redo my front panel and get proper wiring but that's not very important right now (I prefer to spend a bit of time trying to get the ESP32 working instead of the atmega 2560) |
Aw yes, a noble cause! Good luck. |
I think I might have found a scenario that causes an infinite loop in the ring buffer code for the filament sensor in planner.cpp. The clue from empakoso was that if he configures for the filament sensor, but then turns it off via M406, the problem does not occur. It really narrowed down where the problems could be. In line 1111 of planner.cpp: if filwidth_delay_dist is 209.9999999 for example, which it could be at this point in the code, then the result in filwidth_delay_index[0] will be 21, which exceeds the array size (with default config settings). Then when we hit line 1120: This condition will never be true, since filwidth_delay_index[1] is constrained to be in range, and so never breaks this loop. Not sure why the addition of 0.0001 is there, but it probably should not be. empakoso has been emailing me and I suggested he remove the 0.0001 and he has offered to make this change and test it on his side. Just thought I would post the possible fix here for additional review. The code has changed quite a bit since my original submission, but it looks like I had a similar infinite loop situation in my original code, but in that code its a > that should be a >=. Not sure if it makes sense to fix that for prior versions of Marlin. This might also be causing the issue reported here #5851 This error only occurs under unique circumstances for certain sizes of the extruder moves (close to 210mm for the default settings). So it will depend on the moves found in a particular gcode file. |
@filipmu Uploaded the changes as indicated above. Loaded my problematic print/gcode with M405 (enabled FWS option) set M407 to report filament diameter at least once per layer. I checked my serial log at print start, and it reported M405 was 'ok'. To add to this fw upload/debug,I enabled and used during my print; fwRetraction and Linear Advance set at K=35. I know these do not have any bearing on this test. The intent was to excercise all conditions to make sure it all works. I am cautiously optimistic since my first attempt printed 100% successfully. Now off to attempt a couple more. |
Success! The bug pointed out above was modified as described, and for the first time the problematic gcode has printed 100% complete for three attempts. I believe this bug was the cause of the stall. Thanks to everyone for sticking around to help me find this. Huge thanks the @filipmu for his continuing support of a great product - the Filament Width Sensor! I will leave this thread open so thatthe next steps of corrective action can complete this issue. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
OK, this will be my first issue post. I am very new to this developer forum format, so please bear with me. There may be alot of extra information, but I have been driving my wife nuts trying to cover every avenues (she just doesn't understand why I am going through rolls of filament and junk prints:). I am an electro-mechanical machine tech so this is right up my alley. What I fully understand in this community is intermittent issues... which is what I have. I have been able to reliable repeat this random occurrence (so I guess for me this is not so intermittent ;) .
First some background. This printer was a pilot project to use in my wife's bakery. Hence the-not-my-choice "pink" colour! Little name dropping "Perfectly Sweet" in Smithville, Ontario, Canada.
All files noted below are found here: https://www.dropbox.com/sh/mzomrc7h0s5nopg/AAA4gp4hHF8y2mjrtzjZtUyHa?dl=0
A) The Issue (or "the hook"...)
I have zero'ed in on the Filament Width Sensor option. The Print Stall or Print Stop randomly happens when I enable:
The printer will print then stall without warning.
This is no fault reported, just see a pause in Repetier, Pronterface, OctoPi Terminal, then I see 'start' printed by the host. (thanks to smart people like you, OctoPi will stop the print, seeing the line number doesn't match - Pronterface, not so much oops!)
In a nutshell these are abstracts of what I think may be linked options:
Started using this on by upgrading the FW to 1.1.0-RC8
Prints would stop first on larger prints only, then realized it had something to do with designs and complexity, quantity of arcs in the print. The more arcs the sooner it would fault. idk maybe? But one thing is for sure I replicated the fault using this type of print.
B) The Hardware (or "the goods"!)
The board I am using is:
https://www.aliexpress.com/item/Reprap-3D-Printer-Control-Motherboard-ZRIB-V2-Compatible-with-RAMPS-1-4-Printer-Control-Reprap-Mendel/32759128127.html?shortkey=FZvQZBJZ&addresstype=600
I built a Filament Width Sensor by ordering the part from
https://www.thingiverse.com/thing:454584
http://objectswithintelligence.weebly.com/
Thank you Filip! I like the sensor, the design is solid, and the measurements correspond to physical and analog output all tested using scope and high tech equipment not meant for personal use ;)
I am using it on this machine, Prusa i3 replica Model: Prusa i3 EG-V2.0
https://www.electronicgeek.ca/collections/3dprinters/products/reprap-prusa-i3-3d-printer-diy-kit-canada
The modifications which I thought would correct the issues. Didn't work, but I have a solid machine.
C) The Firmware (or the "smarts")
I upgraded it to v1.1.0-RC8 to start using the FWS option.
The Prusa i3 EG-V2.0 (aka P802N) was installed with P802GA_8_ZRIB_Zonestar Marlin 1.1.0 RC5 then the local distributer modified it to fit the final sale model.
Firmware sold/supplied with printer (Marlin v1.1.0-RC5 corrected for EG printers:
https://storage.googleapis.com/wzukusers/user-18802988/documents/5890b97e21e66IZdz1xV/EG-1-V2-firmware.zip
Firmware supplied from China for this distributor:
https://storage.googleapis.com/wzukusers/user-18802988/documents/56d50eb61e94dIKT1fJR/P802GA_8_MarlinV1_ZRIB_Zonestar.zip
D) what I've done so far:
At first I thought is was something to do with some these awesome new features for my printer, volumetric, linear advance, fw retract. So I read and read on this forum for any relation to the ZRIB board or by itself on reports of stalls. This is a simple log I started to keep. NOTE that all of these changes with FWS is enabled:
Trying to keep Filament Width Sensor!
1.1.0-2
LIN ADVANCE on VOLUMETRIC on FWRETRACT on (SLIC3R useFW not checked) = fail
LIN ADVANCE on VOLUMETRICoff FWRETRACT on (SLIC3R useFW not checked) = pass
1.1.1 -changed baudrate to 250000
LIN ADVANCE on VOLUMETRICoff FWRETRACT on (SLIC3R useFW not checked) = fail
LIN ADVANCE on VOLUMETRICoff FWRETRACT on (SLIC3R useFW is checked ) = fail
LIN ADVANCE on VOLUMETRICoff FWRETRACToff = fail
LIN ADVANCEoff VOLUMETRICoff FWRETRACToff = fail
1.1.1 -change baudrate back to 115000
LIN ADVANCEoff VOLUMETRICoff FWRETRACToff = pass
1.1.1 -with RELATIVE E on (SLIC3R on)
LIN ADVANCEoff VOLUMETRICoff FWRETRACToff = pass
1.1.1 -has Baudrate 115000
-has Relative E enabled
LIN ADVANCE on VOLUMETRICoff FWRETRACToff = pass
LIN ADVANCE on VOLUMETRICoff FWRETRACT on = pass
note: with VOLUMETRIC disabled in the FW - the FILAMENT "E in mm3" is still 'ON'
could this be for filament width sensor?????
note: enabling FWRETRACT on LCD menu caused the print to hang!!
note: enabling E in mm3 on LCD menu caused the print to hang!!!
Note all these attemps led me to believe it was an intermittent issue with FWS being enabled since the results I received above were random and uncontrollable.
So my attempt began to alter the state of the FWS option while using M100 F (as once suggested by ThinkyHead in a previous Stall thread). Here are my results:
Tried Marlin V1.1.3 (disabled: fwretract, volumetric, Lin adv)
Print "Eric carl" cookie cutter:
4. FWSon baud-115200 normal speeds stall at first layer
5. FWSon baud-115200 (slowed speeds in half) stall 50% into print
Printed a different print revisedcat (not so many arcs - hence my belief it is related to arcs)
6. FWSon baud-250000 SLOWDOWNoff, microstep set to 8 was 16 - stalled after 60%
7. (Returned fw back to point 1. above-'start-over') FWSoff M100on (Slic3r set M100 F on layer change using normal speeds Prusa PLA 1.75 speeds) - printed successfully twice
8. FWSoff M100 F at layer change (log file called "1 FWS off - print OK - M100 enabled.zip"
9. FWSon M100 F (not used) - (log file called "2 FWSon without M100 at layer change - print faulted - octoprint.zip)
10. FWSon M100 F (used at layer change) "3 FWSon with M100 at layer change - stalled 3rd print attempt.zip"
11. FWSon M100F (used at layer change) Now it is confirmed "4 FWSon with M100 at layer change - again3rd attempt.zip)
E) Summary:
http://www.thingiverse.com/thing:2209962
If you've made it this far, all I can say is thank you for your intereset. I am frantically trying to type and remember all my collected data and its events. Please forgive the data ramblings, I have tried to apply some weird sense of designed-experiments (long ago college course haha).
I'll respond as quickly as possible to any questions. I hope the above answers most. If someone with better coding experience than my home-made interest, could please take a look at the relationship between FWS and these stalls, I would be very happy and I am sure others as well.
...TIA
The text was updated successfully, but these errors were encountered: