Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print Stall / Print Stop (no faults reported - watchdog resets) vrs 1.1.3 and ZRIB board #6992

Closed
empakoso opened this issue Jun 8, 2017 · 47 comments

Comments

@empakoso
Copy link

empakoso commented Jun 8, 2017

OK, this will be my first issue post. I am very new to this developer forum format, so please bear with me. There may be alot of extra information, but I have been driving my wife nuts trying to cover every avenues (she just doesn't understand why I am going through rolls of filament and junk prints:). I am an electro-mechanical machine tech so this is right up my alley. What I fully understand in this community is intermittent issues... which is what I have. I have been able to reliable repeat this random occurrence (so I guess for me this is not so intermittent ;) .

First some background. This printer was a pilot project to use in my wife's bakery. Hence the-not-my-choice "pink" colour! Little name dropping "Perfectly Sweet" in Smithville, Ontario, Canada.

All files noted below are found here: https://www.dropbox.com/sh/mzomrc7h0s5nopg/AAA4gp4hHF8y2mjrtzjZtUyHa?dl=0

A) The Issue (or "the hook"...)
I have zero'ed in on the Filament Width Sensor option. The Print Stall or Print Stop randomly happens when I enable:

#define FILAMENT_WIDTH_SENSOR

The printer will print then stall without warning.
This is no fault reported, just see a pause in Repetier, Pronterface, OctoPi Terminal, then I see 'start' printed by the host. (thanks to smart people like you, OctoPi will stop the print, seeing the line number doesn't match - Pronterface, not so much oops!)

In a nutshell these are abstracts of what I think may be linked options:

#define FILAMENT_WIDTH_SENSOR
#define FILAMENT_LCD_DISPLAY
#define BAUDRATE 115200
#define AXIS_RELATIVE_MODES {false, false, false, true} //slic3r set up for this
#define MOTHERBOARD BOARD_ZRIB_V20
#define DEFAULT_XJERK                  5.0
#define DEFAULT_YJERK                  5.0
#define DEFAULT_ZJERK                  0.4
#define DEFAULT_EJERK                  5.0
#define DEFAULT_MAX_ACCELERATION      { 3000, 3000, 100, 5000 }
#define EEPROM_SETTINGS
#define M100_FREE_MEMORY_WATCHER
#define NOZZLE_CLEAN_FEATURE
#define MINIMUM_STEPPER_PULSE 1
#define MICROSTEP_MODES {16,16,16,16,16
//#define LIN_ADVANCE
//#define FWRETRACT  
//#define VOLUMETRIC_DEFAULT_ON

Started using this on by upgrading the FW to 1.1.0-RC8

Prints would stop first on larger prints only, then realized it had something to do with designs and complexity, quantity of arcs in the print. The more arcs the sooner it would fault. idk maybe? But one thing is for sure I replicated the fault using this type of print.

B) The Hardware (or "the goods"!)
The board I am using is:
https://www.aliexpress.com/item/Reprap-3D-Printer-Control-Motherboard-ZRIB-V2-Compatible-with-RAMPS-1-4-Printer-Control-Reprap-Mendel/32759128127.html?shortkey=FZvQZBJZ&addresstype=600

I built a Filament Width Sensor by ordering the part from
https://www.thingiverse.com/thing:454584
http://objectswithintelligence.weebly.com/
Thank you Filip! I like the sensor, the design is solid, and the measurements correspond to physical and analog output all tested using scope and high tech equipment not meant for personal use ;)

I am using it on this machine, Prusa i3 replica Model: Prusa i3 EG-V2.0
https://www.electronicgeek.ca/collections/3dprinters/products/reprap-prusa-i3-3d-printer-diy-kit-canada

The modifications which I thought would correct the issues. Didn't work, but I have a solid machine.

  1. heat sink on USB chip, and on board chip running Marlin. (heat sinks already on RAMPS step drivers)
  2. fan to cool the board runs all the time, chips are cool to the touch
  3. heavily shield and short as possible high quality USB cable
  4. ferrite beads on all the PID signaled wires (heaters)
  5. separated out the thermally resetting fuse and ran my own glass fuses (separated control from heaters)
  6. added a MOFFset transistor switch for the heated bed - no load on the transistor on the main board
  7. separated the stepper cables from everything that could be induced.
  8. moved the LCD screen ribbon cable outside and around anything electrical (maybe SD card interference - now not so much since I run OctoPi)
  9. took all my ribbon cable apart and used special MIL standard liquid contact enhancer (don't ask how I got it) lets just say when $hit-hits-fan this will send a signal!

C) The Firmware (or the "smarts")
I upgraded it to v1.1.0-RC8 to start using the FWS option.
The Prusa i3 EG-V2.0 (aka P802N) was installed with P802GA_8_ZRIB_Zonestar Marlin 1.1.0 RC5 then the local distributer modified it to fit the final sale model.
Firmware sold/supplied with printer (Marlin v1.1.0-RC5 corrected for EG printers:
https://storage.googleapis.com/wzukusers/user-18802988/documents/5890b97e21e66IZdz1xV/EG-1-V2-firmware.zip
Firmware supplied from China for this distributor:
https://storage.googleapis.com/wzukusers/user-18802988/documents/56d50eb61e94dIKT1fJR/P802GA_8_MarlinV1_ZRIB_Zonestar.zip

D) what I've done so far:
At first I thought is was something to do with some these awesome new features for my printer, volumetric, linear advance, fw retract. So I read and read on this forum for any relation to the ZRIB board or by itself on reports of stalls. This is a simple log I started to keep. NOTE that all of these changes with FWS is enabled:
Trying to keep Filament Width Sensor!
1.1.0-2
LIN ADVANCE on VOLUMETRIC on FWRETRACT on (SLIC3R useFW not checked) = fail
LIN ADVANCE on VOLUMETRICoff FWRETRACT on (SLIC3R useFW not checked) = pass
1.1.1 -changed baudrate to 250000
LIN ADVANCE on VOLUMETRICoff FWRETRACT on (SLIC3R useFW not checked) = fail
LIN ADVANCE on VOLUMETRICoff FWRETRACT on (SLIC3R useFW is checked ) = fail
LIN ADVANCE on VOLUMETRICoff FWRETRACToff = fail
LIN ADVANCEoff VOLUMETRICoff FWRETRACToff = fail
1.1.1 -change baudrate back to 115000
LIN ADVANCEoff VOLUMETRICoff FWRETRACToff = pass
1.1.1 -with RELATIVE E on (SLIC3R on)
LIN ADVANCEoff VOLUMETRICoff FWRETRACToff = pass
1.1.1 -has Baudrate 115000
-has Relative E enabled
LIN ADVANCE on VOLUMETRICoff FWRETRACToff = pass
LIN ADVANCE on VOLUMETRICoff FWRETRACT on = pass
note: with VOLUMETRIC disabled in the FW - the FILAMENT "E in mm3" is still 'ON'
could this be for filament width sensor?????
note: enabling FWRETRACT on LCD menu caused the print to hang!!
note: enabling E in mm3 on LCD menu caused the print to hang!!!

Note all these attemps led me to believe it was an intermittent issue with FWS being enabled since the results I received above were random and uncontrollable.

So my attempt began to alter the state of the FWS option while using M100 F (as once suggested by ThinkyHead in a previous Stall thread). Here are my results:

  1. Small prints never seem to be an issue without arcs.
  2. When print stalls I've reverted to originally supplied fw and is ok
  3. The Issue started when fw upgraded for new FWSoptin

Tried Marlin V1.1.3 (disabled: fwretract, volumetric, Lin adv)
Print "Eric carl" cookie cutter:
4. FWSon baud-115200 normal speeds stall at first layer
5. FWSon baud-115200 (slowed speeds in half) stall 50% into print

Printed a different print revisedcat (not so many arcs - hence my belief it is related to arcs)
6. FWSon baud-250000 SLOWDOWNoff, microstep set to 8 was 16 - stalled after 60%
7. (Returned fw back to point 1. above-'start-over') FWSoff M100on (Slic3r set M100 F on layer change using normal speeds Prusa PLA 1.75 speeds) - printed successfully twice
8. FWSoff M100 F at layer change (log file called "1 FWS off - print OK - M100 enabled.zip"
9. FWSon M100 F (not used) - (log file called "2 FWSon without M100 at layer change - print faulted - octoprint.zip)
10. FWSon M100 F (used at layer change) "3 FWSon with M100 at layer change - stalled 3rd print attempt.zip"
11. FWSon M100F (used at layer change) Now it is confirmed "4 FWSon with M100 at layer change - again3rd attempt.zip)

E) Summary:

  1. So it seems this will only print when the FWS is disabled.
  2. It will stall in random locations of the print between first, second, or third attempt
  3. Print stalls most often with many arcs (large prints with arcs will also do it such as
    http://www.thingiverse.com/thing:2209962
  4. Hosting software doesn't report any faults when it happens, either Repetier, Pronterface, or OctoPrint
  5. Host waits for response after a Stall, times out, Printer reboots, and the Line numbers don't start at one like reported in the log Extruder running backwards even when E values are increasing in gcode #4 noted above by pronterface
  6. M100 f does not report any corruption as shown in this same log Compile in arduino 22, M80/M81 support, remove M140/M190 from supported gcodes #3 and Extruder running backwards even when E values are increasing in gcode #4.
  7. I notice an unexpected menu to show up under Control Filament "E in mm3:" when volumetric and FWS options are off. I do not use this option in my gcode start code, instead explicitly set M200 D0

If you've made it this far, all I can say is thank you for your intereset. I am frantically trying to type and remember all my collected data and its events. Please forgive the data ramblings, I have tried to apply some weird sense of designed-experiments (long ago college course haha).

I'll respond as quickly as possible to any questions. I hope the above answers most. If someone with better coding experience than my home-made interest, could please take a look at the relationship between FWS and these stalls, I would be very happy and I am sure others as well.

...TIA

@Roxy-3D
Copy link
Member

Roxy-3D commented Jun 8, 2017

One thing that would be helpful is try the same print from an SD-Memory card with the printer disconnected from anything else. If it still freezes we know the problem is in the printer's firmware and not a communication issue with the host computer.

Also... I doubt you are running out of memory... But after the printer has been running and doing things for a little while, how much free memory does M100 find?

@empakoso
Copy link
Author

empakoso commented Jun 8, 2017

In fact I printed and noticed this first on my SD card while printing the perpetual wheel mentioned ( numerous times :)

@empakoso
Copy link
Author

empakoso commented Jun 8, 2017

The M100 log I received was included with log 3 and 4. I can't recall at the moment but it was not varying at all, every response had the same values.

@Roxy-3D
Copy link
Member

Roxy-3D commented Jun 8, 2017

It wasn't clear to me which files were the log files. We want 2KB of free memory just because that will rule out the possibility of running out of memory.

In fact I. Priced this first on my SD card printing the perpetual wheel mentioned ( numerous tims:)

I don't know what 'Fact I' is. But up until I posted a reply, the only mention of an SD Memory card was this:

moved the LCD screen ribbon cable outside and around anything electrical (maybe SD card interference - now not so much since I run OctoPi)

Seriously... People are not going to read a "War and Peace" novel just so they can have the privilege of being rebuked when they try to help you.

@Bob-the-Kuhn
Copy link
Contributor

When you're getting the "start" message that means that the controller has gone through reset. If there are no error prints before the start message then most likely it's a watchdog timer timeout. Most of those are because the printer is overloaded and can't do the thermal management at the same time as moving the steppers.

The only solutions I know of is to lower the feed rates, use less microsteps, or disable options like Linear Advance or UBL leveling.

@Roxy-3D
Copy link
Member

Roxy-3D commented Jun 8, 2017

One easy way to get back a bunch of CPU cycles is to bump up the

#define LCD_UPDATE_INTERVAL 100

number. The LCD panel will start being a little bit sluggish. But especially if you are not really using the LCD Panel, it doesn't matter. All those free cycles can go to doing real work.

@empakoso
Copy link
Author

empakoso commented Jun 8, 2017

I totally came across incorrectly. I was stunned at the speed of your response and was so happy. My mistake was my wording. I am very interested in your input. I had never intended to ever come across as a rebuke.
It was great to be able to respond, "that i had forgotten to add the fact that in my original post about the SD card attempts"
I apologize.

@empakoso
Copy link
Author

empakoso commented Jun 8, 2017

Excellent, I will set my LCD UPDATE INTERVAL and report back.
Thanks,

@empakoso
Copy link
Author

empakoso commented Jun 8, 2017

It's funny you say the feed rates, I was kind of leaning to that after I realized I could get further into the print before stalling if I reduced my speeds by half. I have disabled the linear advance option as well. So I concur with your post.

I will however, do as you say, and disable any UBL. I believe I am using BiLinear at the moment.

@empakoso empakoso closed this as completed Jun 8, 2017
@empakoso empakoso reopened this Jun 8, 2017
@Roxy-3D
Copy link
Member

Roxy-3D commented Jun 8, 2017

I apologize.

Accepted...

Excellent, I will set my LCD UPDATE INTERVAL and report back.

It will make the LCD Panel a little bit 'sluggish'. But the LCD Panel will still work as expected and do everything you tell it to do. And if you are printing with a remote host... for how often you actually use the LCD Panel, it really doesn't matter.

@empakoso
Copy link
Author

empakoso commented Jun 9, 2017

The LCD_UPDATE_INTERVAL was set to 300. Also set microstepping down to 2. The print stalled

I took a chance and set the LCD_UPDATE_INTERVAL to 1000ms and it was the first time my print was successful on one of my most stall-challenged print. Thank you. What a relief.

I still have a few questions regarding this problem.
-Does M420 option take up a lot of processing power? (Since i disabled it but kept a 9pt BiLinear bed levelling setting for my G29)
-Does M100 take up processing power? I want to trim down my processing useage since my ZRIB seems to be weak in that department.
-why do I have a Filament LCD menu when Volumetric and Filament Width Senor options are turned off? I am not sure why I can see 'E in mm3' as an option. Could it be because I left the 'define display filament width' (just after FWS option) option on?

@Roxy-3D
Copy link
Member

Roxy-3D commented Jun 9, 2017

-Does M100 take up processing power? I want to trim down my processing useage since my ZRIB seems to be weak in that department.

I can answer the M100 question. The answer is "No!" M100 is passive unless you ask it for information. M100 initializes the free memory area with 0xE5 values. When M100 is looking for corruption... It is looking for non-0xE5 values in the middle of the free area.

And the 'free memory' area is simply the block of solid 0xE5 numbers. The stack will eat at that block from one direction. And the heap will eat at the block from the other direction. Right now, there should be no heap activity.

Once you print a layer of your file, you can pause the print and do a M100 F If the free memory is greater than 2KB, you can turn the M100 code (option) off. Or, you can just edit your GCode file and add a M100 F when it does a layer change and watch your host to see what it says.

@empakoso
Copy link
Author

empakoso commented Jun 9, 2017

Thank you for that insight to M100. I will continue to monitor by saving my OctoPi serial logs with M100 F responses.

@simon-jouet
Copy link
Contributor

simon-jouet commented Jun 10, 2017

I'm actually getting a very similar issue with version 1.1.3 (i'm running from bugfix-1.1x from this afternoon) and the printer just hangs at some point during the print.

I don't get any error message or warning, I can pause the print and send move commands in pronterface, I will try to change the LCD_UPDATE_INTERVAL as suggested by @Roxy-3D and let you know if that's better but second x-carriage that hang after a few hours in.

Here is my config if it's of any use

Configuration.txt

I'm running an normal RAMPS1.4 board with a custom panel (with an SSD1306 and a rotary encoder)

@Roxy-3D
Copy link
Member

Roxy-3D commented Jun 10, 2017

I will try to change the LCD_UPDATE_INTERVAL as suggested by @Roxy-3D and let you know if that's better but second x-carriage that hand after a few hours in.

This is a brute force way to lighten the load on the CPU. If we are running into trouble because the CPU is always busy... We need to find out why and fix it. If doing this brute force 'solution' lets you print, Great! But that doesn't mean the problem is fixed!

@simon-jouet
Copy link
Contributor

Hi Roxy, yeah I get that, if that's the issue and this print work, I will roll back to my last working version and we can do a diff see what has changed between then and now.

@empakoso
Copy link
Author

empakoso commented Jun 10, 2017

I have been cautiously optimistic. I completed this problematic print once. This is a huge step forward for me, thank you @Roxy-3D

My second attempt at the same print failed. As discovered it always fails within 1-3 attempts. I thought I was going out of my mind because I couldn't nail it down to just one print, or even better yet the same location each time! I contacted @filipmu and he enlightened me to a possibility, namely "interupts". I am sure it has to do with the amount of things going on at once during the print. I am sure it is not the arduino based board I am using.

My next attempt is to remove PID from my heat bed temp control, and removing my SD card support.

Hopefully we can get to bottom of this.

@Roxy-3D
Copy link
Member

Roxy-3D commented Jun 10, 2017

And just to be clear... I do think the interrupt code works. But if the CPU is too busy and an interrupt fires while the processor is in the (previous) interrupt code, the stack starts to wind up. If that happens again and again and again... Eventually the processor uses up all the free memory. And the firmware will crash.

So if that is happening... We need to figure out why. But reducing how often the LCD Panel gets updated will free up cycles.

@empakoso
Copy link
Author

@Roxy-3D your explanation makes complete sense.

Would there be a way to either breakup each interrupt code using some type of allow-buffer-to-empty wait command? Or should I try to use M400 on every layer change after my M100 F?

It might print and it may allow for some type of proof of concept?

@Roxy-3D
Copy link
Member

Roxy-3D commented Jun 10, 2017

You could do that. That would be valuable information.

I'm in the middle of trying to get the LCD Panel to do UBL Mesh Editing for 20x4 displays, so I'm kind of distracted. But what I would do is go add a few lines of code to every interrupt handler so it bumped a counter. And I would have it decrement the counter when it left the ISR. And if that counter ever was set when the interrupt handler started, we know we have recursion happening. And that would provide proof this is the problem.

@empakoso
Copy link
Author

I am all ears. Let me know how I can help. Obviously lets keep you focused on your current task. I will do the M400 thing and report back.

@simon-jouet
Copy link
Contributor

After three failed prints and changing the LCD_UPDATE_INTERVAL to 1000 to free some CPU cycles I finally managed to get a print complete. If i've some time tomorrow I will do what you suggest with the counters in the interrupts.

@empakoso
Copy link
Author

During Layer Change added M400 - Print Stalls

@empakoso
Copy link
Author

Busy printing yesterday, and had three stalls. This morning already one stalled on me. It was such as simple straight line job (about 40 minutes). That was with FWS on and LCD interval set to 1000ms

I believe these stalls are from more than just my options being turned on. I have now disabled all the extras; FWS, Volumentric, Linear Advance, (excluding parking, nozzle cleaning and Bilinear bed leveling - they are still on).

@empakoso
Copy link
Author

@simon-jouet if you want to share your debugging method, I would gladly perform the same changes on my end to see if we are getting similar results.

@simon-jouet
Copy link
Contributor

@empakoso I haven't had time to look into this just yet, I did some maintenance on the printer yesterday and noticed some dodgy connections on my extruder thermistor (so might have been this). I will try to restore the LCD update interval to the default value and see if it still crashes, if so I will do what was suggested before

@empakoso
Copy link
Author

That's really interesting, in the meantime I will check my electrical connections. Hopefully I find something like you did. Thanks.

@simon-jouet
Copy link
Contributor

simon-jouet commented Jun 12, 2017

Okay I restored the LCD update interval and pulled the latest bugfix and same issue, it froze, I just paused the print, resumed and it kept going, next print I will check for recursive calls

EDIT: It crashed 4 times during the print (at shorter intervals) again just pausing and resuming allowed to keep going

@empakoso
Copy link
Author

Definitely paying attention.

If you have watchdog enabled, your 'pause' and 'resume' maybe the same experience I had when I received a 'start' serial response. My print just started where it left off (it began by homing first which crashed into my partial print)

@Roxy-3D
Copy link
Member

Roxy-3D commented Jun 12, 2017

Definitely paying attention.

I'm also paying close attention. You are not just talking to yourself.

@empakoso
Copy link
Author

@Roxy-3D I completely understand. At the risk of sounding too eager, I was glad it wasn't just me. I appreciate your constant contact with this issue. I also understand that patience at this point is very important. Thank you.

@simon-jouet
Copy link
Contributor

Okay so I've added a counter to the ISR and print if the counter doesn't have the excepted value and didn't get any error message in the console. If it's any interest below is the diff, and logs in pronterface when I pause/resume to keep going. This time it took quite a while ~2hours before the first crash then the second was moments after I resumed, it's still printing so might crash a few more times before it's done...

diff --git a/Marlin/stepper.cpp b/Marlin/stepper.cpp
index 4b9167b6..27f59a48 100644
--- a/Marlin/stepper.cpp
+++ b/Marlin/stepper.cpp
@@ -334,12 +334,20 @@ void Stepper::set_directions() {
  *  2000     1 KHz - sleep rate
  *  4000   500  Hz - init rate
  */
+volatile int stepisrcnt = 0;
 ISR(TIMER1_COMPA_vect) {
+  stepisrcnt++;
+  if (stepisrcnt != 1) {
+    SERIAL_ERROR_START();
+    SERIAL_ERRORLNPGM("recursive stepper isr.");
+  }
+
   #if ENABLED(ADVANCE) || ENABLED(LIN_ADVANCE)
     Stepper::advance_isr_scheduler();
   #else
     Stepper::isr();
   #endif
+  stepisrcnt--;
 }

 #define _ENABLE_ISRs() do { cli(); if (thermalManager.in_temp_isr) CBI(TIMSK0, OCIE0B); else SBI(TIMSK0, OCIE0B); ENABLE_STEPPER_DRIVER_INTERRUPT(); } while(0)
diff --git a/Marlin/temperature.cpp b/Marlin/temperature.cpp
index 56be635d..0c77e37e 100644
--- a/Marlin/temperature.cpp
+++ b/Marlin/temperature.cpp
@@ -1606,7 +1606,17 @@ void Temperature::set_current_temp_raw() {
  *  - For PINS_DEBUGGING, monitor and report endstop pins
  *  - For ENDSTOP_INTERRUPTS_FEATURE check endstops if flagged
  */
-ISR(TIMER0_COMPB_vect) { Temperature::isr(); }
+volatile int tempisrcnt = 0;
+ISR(TIMER0_COMPB_vect) {
+  tempisrcnt++;
+  if (tempisrcnt != 1) {
+    SERIAL_ERROR_START();
+    SERIAL_ERRORLNPGM("recursive temp isr.");
+  }
+
+  Temperature::isr();
+  tempisrcnt--;
+}

 volatile bool Temperature::in_temp_isr = false;

Setting Print and Travel Acceleration: 2000.00
Setting Print and Travel Acceleration: 1000.00
Print paused at: 14:49:03
Resuming.
Print resumed at: 14:49:04
Setting Print and Travel Acceleration: 1000.00
Setting Print and Travel Acceleration: 800.00
Setting Print and Travel Acceleration: 1000.00
Setting Print and Travel Acceleration: 800.00
Setting Print and Travel Acceleration: 1000.00
Setting Print and Travel Acceleration: 800.00
Setting Print and Travel Acceleration: 1000.00
Setting Print and Travel Acceleration: 2000.00
Setting Print and Travel Acceleration: 1000.00
Setting Print and Travel Acceleration: 2000.00
Setting Print and Travel Acceleration: 1000.00
Setting Print and Travel Acceleration: 2000.00
Setting Print and Travel Acceleration: 1000.00
Setting Print and Travel Acceleration: 800.00
Setting Print and Travel Acceleration: 1000.00
Setting Print and Travel Acceleration: 800.00
Setting Print and Travel Acceleration: 1000.00
Setting Print and Travel Acceleration: 800.00
Setting Print and Travel Acceleration: 1000.00
Setting Print and Travel Acceleration: 2000.00
Setting Print and Travel Acceleration: 1000.00
Setting Print and Travel Acceleration: 2000.00
Setting Print and Travel Acceleration: 1000.00
Print paused at: 14:49:26
Resuming.
Print resumed at: 14:49:27
Setting Print and Travel Acceleration: 1000.00
Setting Print and Travel Acceleration: 800.00
Setting Print and Travel Acceleration: 1000.00

@Roxy-3D
Copy link
Member

Roxy-3D commented Jun 13, 2017

This time it took quite a while ~2hours before the first crash then the second was moments after I resumed

Did it 'crash' ? Or did it lock up? In my mind, those are different things. It sounds more like you are having lockups but if you pause, and continue, that bumps the logic enough things can recover.

@simon-jouet
Copy link
Contributor

@Roxy-3D you're right it locks up, the board doesn't reset and the pause/resume allows it to recover.

@empakoso
Copy link
Author

Are there timeout timers I could adjust to see if it makes a difference?

@simon-jouet
Copy link
Contributor

simon-jouet commented Jun 14, 2017

I just reverted to 1.1.x I'm about to start a print that pretty much stall every time, let's see if that's the case this time

EDIT: ah it stalled few seconds in, right after the skirt, next print I will revert to 1.1.2

@empakoso
Copy link
Author

I am at the ready with the following versions:
1.1.0-RC5 (original with new printer) = no stalling
1.1.0-RC8 = stalled
1.1.0 = stalled
1.1.0-2 = stalled
1.1.1 = stalled
1.1.2 = stalled
1.1.3 = stalled
1.1.0-bugfix is waiting for upload...

With all the extras disabled, 1.1.3 doesn't stall as much. I only have M48, park, BiLinear mesh and nozzle clean enabled at the moment. I have left my FWS, fwRetract, and LinAdv disabled.

@simon-jouet
Copy link
Contributor

simon-jouet commented Jun 14, 2017

I tried bugfix-1.1.x 1.1.3 1.1.2 1.1.1 and it stalled 1.1.0 bugfix used to work, I will keep going down in version like you, hopefully we end up figuring it out

@simon-jouet
Copy link
Contributor

@empakoso don't know if you got yours working? After rolling back few version that should have worked I was still getting the issue, I tried changing the cable etc... was still the same but I think it was still an issue with the USB connection and not the board itself. I went for a much shorter cable with octopi and it's been working perfectly. Hopefully your problem is somehow similar.

@Roxy-3D sorry for the troubles :)

@empakoso
Copy link
Author

@simon-jouet I am very happy to hear you got it to work. I too thought this was an issue at first. I have already been using a short (45cm),shielded, and generally good quality USB cable between my rig and the host. In your case it might also help to set your baud to 115200 as o have done, this helps that issue as well.

Sadly I am still experiencing the stall. I have made a somewhat breakthrough. I now know that with all my cool Marlin options enabled, which includes FWS as enabled, I can overcome my stall by simply replacing M405 with M406 in my startup code. It prints 100% every time, even after multiple tries or more than 3x's.

This means that I disable the sensor and it prints. I will keep you posted.

@empakoso
Copy link
Author

@simon-jouet did you have this stall issue when you print from the SD card w/o a host connected? Meaning w/o using a USB cable?

@simon-jouet
Copy link
Contributor

@empakoso good to hear that you figured how to get it working, not optimal but at least you have a workaround for now.

I tried only once with the SDcard and it stalled but my LCD panel is homemade and the connector between the RAMPS and the SDcard reader was far from optimal. When I have a bit more time I will redo my front panel and get proper wiring but that's not very important right now (I prefer to spend a bit of time trying to get the ESP32 working instead of the atmega 2560)

@empakoso
Copy link
Author

Aw yes, a noble cause! Good luck.

@filipmu
Copy link

filipmu commented Jun 22, 2017

I think I might have found a scenario that causes an infinite loop in the ring buffer code for the filament sensor in planner.cpp. The clue from empakoso was that if he configures for the filament sensor, but then turns it off via M406, the problem does not occur. It really narrowed down where the problems could be.

In line 1111 of planner.cpp:
filwidth_delay_index[0] = (int)(filwidth_delay_dist * 0.1 + 0.0001);

if filwidth_delay_dist is 209.9999999 for example, which it could be at this point in the code, then the result in filwidth_delay_index[0] will be 21, which exceeds the array size (with default config settings).

Then when we hit line 1120:
} while (filwidth_delay_index[0] != filwidth_delay_index[1]); // More slots to fill?

This condition will never be true, since filwidth_delay_index[1] is constrained to be in range, and so never breaks this loop.

Not sure why the addition of 0.0001 is there, but it probably should not be.

empakoso has been emailing me and I suggested he remove the 0.0001 and he has offered to make this change and test it on his side. Just thought I would post the possible fix here for additional review.

The code has changed quite a bit since my original submission, but it looks like I had a similar infinite loop situation in my original code, but in that code its a > that should be a >=. Not sure if it makes sense to fix that for prior versions of Marlin.

This might also be causing the issue reported here #5851

This error only occurs under unique circumstances for certain sizes of the extruder moves (close to 210mm for the default settings). So it will depend on the moves found in a particular gcode file.

@empakoso
Copy link
Author

@filipmu Uploaded the changes as indicated above. Loaded my problematic print/gcode with M405 (enabled FWS option) set M407 to report filament diameter at least once per layer. I checked my serial log at print start, and it reported M405 was 'ok'.

To add to this fw upload/debug,I enabled and used during my print; fwRetraction and Linear Advance set at K=35. I know these do not have any bearing on this test. The intent was to excercise all conditions to make sure it all works.

I am cautiously optimistic since my first attempt printed 100% successfully.

Now off to attempt a couple more.

@empakoso
Copy link
Author

Success! The bug pointed out above was modified as described, and for the first time the problematic gcode has printed 100% complete for three attempts.

I believe this bug was the cause of the stall.

Thanks to everyone for sticking around to help me find this. Huge thanks the @filipmu for his continuing support of a great product - the Filament Width Sensor!

I will leave this thread open so thatthe next steps of corrective action can complete this issue.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants