Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] "Heating Failed" after PID takes over #21661

Closed
Thinkersbluff opened this issue Apr 19, 2021 · 67 comments
Closed

[BUG] "Heating Failed" after PID takes over #21661

Thinkersbluff opened this issue Apr 19, 2021 · 67 comments

Comments

@Thinkersbluff
Copy link

Thinkersbluff commented Apr 19, 2021

Did you test the latest bugfix-2.0.x code?

Yes, and the problem still exists.

Bug Description

This bug is present in the CR6-SE Community Firmware at Release 6, which incorporates Marlin bugfix2.0. I am running this particular version, compiled for my printer's hardware configuration: CF6.1-Pre2-btt-skr-cr6-with-stock-creality-tft-2021-04-18-22-12.zip

I reported this bug on the Community Firmware GitHub Issue#248, but the CF6 developer has asked me to report it upstream, here.

Description:
The nozzle temperature sometimes fails to reach the target value when heating.
When this is happening, the temperature may climb to within a few degrees of target, but then drops again, cycling around a center value approximately 10 degrees below target. Eventually, the system throws a "Heating Failed failed to achieve target temperature within the alloted timeframe" message on the screen and "kills" the job, forcing the user to cycle power to recover.

wtf
wtf.txt

Bug Timeline

Issues with PID not performing as well as in the past have been reported by several users since Release 6 of the Community Firmware.

Expected behavior

Nozzle should heat to the target temperature and stabilize. Particularly if the system has just had a PID run (M303 E0 Sxxx U1) at the same target temperature with no problem, yet now can not heat to that xxx temperature to print.

Actual behavior

When this problem occurs, the serial interface shows that the printer recognizes the correct target temperature, yet it stops short of heating to that value, cycling instead around a value about 10 deg C lower than the target.

NOTE: These users on the Creality CR6SE/MAX Official Facebook Group describe this same problem in other scenarios, so it is not specifically or uniquely an issue when heating to 235C or when running an esteps extrusion.

I have also been able to successfully achieve 230C when I could not achieve 235C and I have achieved 235C when the nozzle was already at 230C, so there are other parameters at play, here, that I have not yet isolated. The part cooling fan was off, the whole time.

In the final cycle of the second graph (see my comment below this post), the printer actually bumped-up from 230 to 235 just at the end, there. No idea why, as you can see it was settling at the lower value & I touched nothing.

Steps to Reproduce

I was able to reproduce this problem fairly consistently as follows:

  1. Connect Octoprint to the printer & monitor the serial interface and the Temperature plot to observe what happens
  2. Use the CF6 PID function to PID the nozzle at 235C
  3. Wait for the nozzle to cool to 145C (or set the temperature to 145C, to let it stabilize there.)
  4. Use the CF6 esteps function to run a calibration extrusion at 235C.
  5. The printer recognizes and reports on the serial interface that the target temperature is 235
  6. The printer stops short of 235 when heating, instead cooling again around 230C
  7. The printer then cycles around approx 225 +/-5 degrees, never trying to achieve 235, as seen in the attached logs.
  8. If left long enough (sorry, did not time it), the printer throws a Heating Failed alert and kills the process.

Version of Marlin Firmware

Latest Bugfix2.0 merged into Community FIrmware Release 6.1 Pre 2 on 18 April 2021

Printer model

Creality CR6-SE

Electronics

BTT SKR CR6 motherboard, stock hotend, stock cooling fans, stock TFT. Users with Creality 4.5.3 boards also report this issue.

Add-ons

None

Your Slicer

Cura

Host Software

OctoPrint

@Thinkersbluff
Copy link
Author

I twice edited the above to include this second graph and log, but the template seems to be truncating it out:
wtf2
wtf2.txt

@Talha909
Copy link

Talha909 commented Apr 19, 2021

I have the same issue. I am using 12 April bug fix. Yesterday I try to do PID but its fails. And it does reach to desire temp. It always below 10 degree. I have SKR 1.4 Turbo with TMC 2130 and in 2nd printer with SKR E3V2 and skr mini

@Sebazzz
Copy link
Contributor

Sebazzz commented Apr 19, 2021

@Thinkersbluff Have we been able to exclude #21374 being the cause?

@portals999
Copy link

I'm using CR6Comm-CF6-Final-cr6-se-v4.5.3-mb-2021-03-27-15-53 and can confirm I too have this issue, I never saw the problem until I got a new 4.5.3 board and upgraded to CR6.

The temp drops to 10C below what you set it to and hovers around there on the heating and homing screen until the yellow heating error screen appears shortly after.

It is always approx 10C too low and I can consistently reproduce this issue.

@scooter2214
Copy link

Same problem here... Board 4.5.3 and Stock TFT... I use the CR6Comm-CF6-Final-cr6-se-v4.5.3-mb-2021-03-27-15-53.zip ...

The temp drops to 10C below what you set it to and hovers around there on the heating and homing screen until the yellow heating error screen appears shortly after.

It is always approx 10C too low and I can consistently reproduce this issue, too

any solutions ?

@Thinkersbluff
Copy link
Author

@Thinkersbluff Have we been able to exclude #21374 being the cause?

Nope. No evidence either way.
I do suspect, though, that rounding-off the thermistor readings may be causing PID to get “tricked” into cycling around the target temperature minus PID_FUNCTIONAL_RANGE. That final temperature cycle in wtf2.png above was clearly converging on the lower value then suddenly “flipped” up to the target temp. I did nothing to cause that. Thermal inertia and random rounding errors could explain that.

@Thinkersbluff
Copy link
Author

@Thinkersbluff Have we been able to exclude #21374 being the cause?
UPDATE - my testing of the Community Firmware Release 6 Prerelease 6 today shows the same behaviour with or without PR #21374 merged into it. This suggests that this problem pre-dates PR #21374.

It appears that the problem is related to using the parameters returned by the current PID routine.
Custom Build Test 5
Rel6Pre6 Test 5

@rhapsodyv
Copy link
Member

I just sent this to make temperature internal calculations using high precision again, maybe worth a test: #21678

@Thinkersbluff
Copy link
Author

I just sent this to make temperature internal calculations using high precision again, maybe worth a test: #21678

Thank you. I just found this existing issue which sounds very similar to the behaviour I am reporting here - #20463.

This problem may date back over a year, if related.

@thisiskeithb
Copy link
Member

There have been several temperature-related fixes merged within the last couple days. Please download bugfix-2.0.x to test with the latest code and let us know if you're still having this issue.

@Sebazzz
Copy link
Contributor

Sebazzz commented Apr 24, 2021

Sorry, it has become less stable. Not only "heating failed" errors but thermal runaway as well. Need some time to collect additional data.

edit: Reverted before the PR and at least there is no thermal runaway.

@Thinkersbluff
Copy link
Author

There have been several temperature-related fixes merged within the last couple days. Please download bugfix-2.0.x to test with the latest code and let us know if you're still having this issue.

Thank you for the outreach.
@Sebazzz let me know that the latest bugfix2.0 had been merged with the Community Firmware Nightly build of Release 6.1 tonight, so I did download and test it on my system. (I have a Creality CR6-SE with a BTT SKR CR6 v1.0 motherboard and a stock Creality DWIN display.)
My tests did NOT result in any Heating Failed or Heating Runaway alarms today, but the PID tuning performance is still not working well, and the PID Autotune function still seems to make things worse instead of better.

When I was reviewing the closed Issues for possible matches to this problem, I found the description and discussion in issue 20463 was particularly similar and relevant to the situation I think we are having here. For that reason, I have experimented with two configurations in these tests:

  1. The predefined Configuration.h values of PID_K1=0.95 and PID_FUNCTIONAL_RANGE=10, and
  2. A version incorporating advice from the discussion in that other issue, with PID_K1=0.55 and PID_FUNCTIONAL_RANGE=16

The test procedures that I used and the results that I logged are documented in the .txt file under each plot.

Here is a copy of the configuration files from the second series of tests. They differ from the first series only in the value of those two parameters.
Configs.zip

My limited math skills and lack of C++ literacy is slowing-down my efforts to understand why these changes seem to improve both the hotend stabilization and the PID Autotune function, but the plots seem to confirm that they do.

I would particularly like to understand the significance of editing these parameters, so that I don't recommend changes that only break something else. I wish I had some way of independently validating the PID factors being generated by Autotune...

6 1Nightly24Apr21_PID_K1-95_RANGE-10_DefaultKs
6.1Nightly24Apr21_PID_K1-95_RANGE-10_DefaultKs.txt
6 1Nightly24Apr21_PID_K1-55_RANGE-16_DefaultKs
6.1Nightly24Apr21_PID_K1-55_RANGE-16_DefaultKs.txt

@pillopaolo
Copy link

I have a similar problem (temperature 5-10 degrees higher than setpoint) with SKR 1.4 (3 of them), while on SKR 1.3 (4 of them) had never had problems. Same printers (Ender3, Sovol SV01). It seems caused by strange noise/peaks in the thermistor reading.

I improved thing a lot by increasing ADC_LOWPASS_K_VALUE from deafult 2 to 6 or 8.

Reading around and looking at the schematics, I see that SKR 1.4 (and maybe other boards) has different configuration than SKR 1.3, including a ESD Suppressor (CG0603MLC-3.3LE). Might this lead to issues with temperature reading, which need special attention?

@Thinkersbluff
Copy link
Author

Thinkersbluff commented Apr 25, 2021

In my work life, we found that one of the easiest ways to explore complex technical issues is to exchange highly simplified pictorial models. If this model is obviously over-simplified in the mind of one or more participant, that suggests an important factor of which other participants may not be aware. By marking up this model to "correct" it, we facilitate a focused conversion of implicit to explicit knowledge. Through that exchange, we all better understand what matters.
Currently, I am thinking that PID Autotune is not working on at least some machines, but it seems - because the number of folks reacting to this issue seems low - most Marlin users are not noticing this issue on their own systems. That implies to me that those of us with problems are operating systems that are configured differently somehow from those other users, in a TBD critical way. Figuring out which differences we can ignore and which seem to be relevant is the hard part.

Open question to those who are interested in solving this problem: Does this model show all of the essential elements that need to be considered + all of the controls available to us to adjust our machine(s) back into a stable operating region?
PID Model.pptx
PID Model

@Sebazzz
Copy link
Contributor

Sebazzz commented Apr 26, 2021

Possibly related #18642 by @bohbotjames

@pillopaolo
Copy link

Adding to my comment above.

If I activate PID debug (#define PID_BED_DEBUG, then M303 D), I see crazy PID output values, especially on the derivative action, when temperature values are noisy and jump +- 1 or 2 degrees is a very short time.

And the fact that the temperature hovers exactly 10 degrees above/below Setpoint is not a coincidence.
Most of the time PID_FUNCTIONAL_RANGE=10!
This means that when above temperature-noise issue is big and PID gives crazy numbers and is not able to control anymore, you end up switching all the time between bang-band and PID modes. The result is a temperature oscillating around [Setpoint - PID_FUNCTIONAL_RANGE] or around [Setpoint - PID_FUNCTIONAL_RANGE].

I did not try, but I bet that if you change PID_FUNCTIONAL_RANGE value, your temperature will hover at a different value.

@Thinkersbluff
Copy link
Author

Thinkersbluff commented Apr 26, 2021

I did not try, but I bet that if you change PID_FUNCTIONAL_RANGE value, your temperature will hover at a different value.

This was also my expectation, so I did explore the impact of changing PID_FUNCTIONAL_RANGE.

The plots posted here show that if I reduce the parameter to 5, the oscillation is more likely to occur consistently.
If I increase the value to 16, it is more likely to achieve the target temperature.

pfr=5_PID_K1=0 95_Tests1-3
CustomBuild_PIDSmoothing95_Range16

Reducing PID_K1 smoothing factor from 0.95 to 0.55 also improved the system’s performance. Doing both seemed to remove the initial undershoot and ringing almost completely for the conditions tested.
CustomBuild_PIDSmoothing55_Range16

The ADC_LOWPASS_VALUE parameter is unique to the HAL.h for LPC1768, so not an option for CR6 users on STM32F1 motherboards.

I am only looking at nozzle temperature at the moment but I think the bed and chamber and laser cooler all use the same PID function.

@Thinkersbluff
Copy link
Author

Possibly related #18642 by @bohbotjames

I agree that looks like the same issue in July 2020. Closed by the author same day without explanation.

@Sebazzz
Copy link
Contributor

Sebazzz commented Apr 26, 2021 via email

@pillopaolo
Copy link

pillopaolo commented May 1, 2021

Dear developers, it seems that SKR 1.4 (my case, see also Talha909 ) and CR-6 (see Sebazzz) boards have temperature noise issues, apparently due to their electronics setup.

Why not adding a user configurable filter for the temperature? Something like OVERSAMPLENR, but a bit more powerful and configurable from configuration_adv.h.

@Thinkersbluff, could you pls reply to above question of Sebazzz? That would be helpful.

Thanks

@Thinkersbluff
Copy link
Author

@Thinkersbluff, could you pls reply to above question of Sebazzz? That would be helpful.

Which question have I not answered? These posts seem a little out of sequence but there are at least two stock 4.5.3 MB users above confirming they have this issue.

@Thinkersbluff
Copy link
Author

Thinkersbluff commented May 7, 2021

I updated my Ender 3 with Marlin 2.0.8, yesterday (fantastic work by all, thank you!)
That printer has a BTT SKR E3 Turbo motherboard (a board that is known to have an ADC noise vulnerability in the version I have installed), a MicroSwiss hot end, and a stock cooling system.
I then repeated one of the tests that I had done on my CR6 SE while documenting this PID problem.

In the first graph:

  1. I selected Nozzle Temperature of 240C, with the default PID parameters. Notice that there is a short small undershoot and then temperature converges on target and holds +/-1deg.
  2. I ran PID by sending M303 S240 E0 C8 U1 via the terminal
  3. I repeated step 1, with the new calculated PID parameters. Notice that the system then exhibited stable "ringing" as seen on other systems, but this time it is centered on the target value, instead of being centered below the target.

In this test,the PID process itself generated a bit of a messy curve, not the smooth cycling I am used to seeing. That may somehow correlate to the set of factors that Marlin computed. The +/- 5-6 degree cycling around the target value with the computed PID parameters persisted for the 10 minutes I left it running.

In the second graph:

  1. I cycled power off/on to reset the system and then repeated the first test sequence above, EXCEPT that I did not allow the system to cool as much before I proceeded to the next step.
  2. I left that 3rd curve from the first test in this second graph, to highlight the contrast between PID performance with the default PID parameters versus PID performance with the calculated parameters.
  3. I noted that the PID Autotune cycling looked much more "normal" in this second run.
  4. The calculated PID parameters were quite different in this second run of the test and the nozzle heating was able to converge on target temperature, after a bit of undershooting.

Ender3_Marlin2 0 8_PID-Test
Ender3_Marlin2 0 8_PID-Test2

@Thinkersbluff
Copy link
Author

I am presently struggling valiantly through the PID "documentation" referenced in the Marlin Configuration.h. (https://reprap.org/wiki/PID_Tuning)

That documentation refers to a parameter which appears to have disappeared from Configuration.h since the document was last updated: "The 'sum of errors*time' value is limited to the range +/-PID_INTEGRAL_DRIVE_MAX as set in Configuration.h."

There are 49 Closed issues which include a reference to PID_INTEGRAL_DRIVE_MAX, so I parsed through them to find out what happened and why. Turns out #4881 removed it (see 3rd issue below). The referenced documentation is out of date.

These 3 closed issues sound a lot like this current issue. The second issue was closed without any corrective action :

  1. Hotend PID fails to reach target when Auto Bed Level feature is turned on. #4086
    • A PID_FUNCTIONAL_RANGE parameter of 50 or 100 seemed to solve the problem
    • The team concluded that the EEPROM PID data or pointer data was somehow being corrupted
      - EEPROM Checksum #4167 merged into Marlin 1.1.0
  2. Exruder heater never stabilizes at set temp #3211
    • A PID_FUNCTIONAL_RANGE parameter of 50 seemed to solve the problem
      - No changes made to Marlin
  3. Bug in PID integrator anti-windup #4881
    • A bug was identified in the Integrator portion of PID
      - Pid unconstrained itemp #4914 merged into Marlin 1.1.0, removing the PID_INTEGRAL_DRIVE_MAX parameter from Configuration.h and removing that time range constraint from the integral term.
    • Numerous other comments suggested other issues remained with PID, but Issue was closed.

@Thinkersbluff
Copy link
Author

Following the advice from team members back in 2016, found on the above related issues, here is the same Ender 3 system as tested in the previous comment, running Marlin 2.0.8 with PID_FUNCTIONAL_RANGE set to 50 (Test conditions and serial log captured in the .txt file below the graph:

Ender3_Marlin2 0 8_PID_FUNCTIONAL+RANGE=50
Ender3_Marlin2_0_8_PID_FUNCTIONAL_RANGE=50.txt

I believe this one parameter change could be optimized with experimentation but annecdotal feedback from numerous issue reports seems to support a suggestion that the current default of 10 may be too low for some systems. This infers to me that Marlin is capable of calculating useable parameters and of performing PID, but it needs a bit more "elbow room" to work, hence the positive impact of increasing the value of that functional range parameter.

@esunayg
Copy link

esunayg commented May 14, 2021

btt skr cr6 + cf6,1p3
I have been using my device with some little fluctiations, just wanted to make some adjustments I have poked with auto tuning. Acoording to my findings, pid loop doesnt calculate D value correctly. Even if you make d and i to 0 it doesnt overshoot. Normally it should get a fixed k value depending on the target and then over time i should get into the calculation and increase the output value. D comes last to not to overshoot. Any pid device works ok without d value if you dont push k and i way over. Even only K value gives a set value stable response. So you and we are trying to cover the D problem by adjusting the k and i. Thats why auto tune fails to respond. It thinks that D value is changed/iterated but not. Just check with your device while only K value. I and D at 0. Temperature "must" be stabilized at any temp. But it doesnt. Digging deep to understand D value calculation. Either variable doesnt change or miscalculated/miscalled in the Pid pwm calculation loop.

// edit: I take it back.

@LouwrensMB
Copy link

I just encountered the same problem my bed temperature head up fine but my nozzle does no reach the set temperature
was there a solution to this ticket or is it still open
"Heating Failed" timeout #21661

@Thinkersbluff
Copy link
Author

Modifying the PID_FUNCTIONAL_RANGE and PID_K1 factors as described above is at least a proven work-around.

@thinkyhead
Copy link
Member

I would recommend against using bang-bang heating for the bed because it tends to produce visible artifacts in the print.

@thinkyhead thinkyhead changed the title Nozzle temperature sometimes cannot reach target - hovers instead about 10 degrees lower until "Heating Failed" timeout [BUG] "Heating Failed" after PID takes over Aug 27, 2021
@thinkyhead
Copy link
Member

If the values produced by M303 PID auto-tune are lacking, there are some tutorials online that include instructions for hand-tuning your PID settings to improve power curve and temperature stability. In general, if PID is taking over and failing to heat fast enough, you can try increasing the P term as a first step. This will increase power but may lead to more overshoot and possibly reduced stability, so you will need to follow up by tuning the I and D terms until the temperature is optimally stable.

It may be the case that power to your hotend and/or bed are being leached when both are turned on at the same time. So, when tuning your hotend PID it will help to turn on bed power, and when PID-tuning the bed, it may actually help to turn on the hotend. If you typically run your fan above a certain temperature with auto-fan, you should run the auto-fan during your PID tuning. All of these will ensure that the PID values are set according to the amount of power that is available during printing. If there is any doubt that this will help, give it a try and see if the PID values are very different from doing isolated PID tuning.

@thinkyhead
Copy link
Member

thinkyhead commented Aug 27, 2021

when I was entering the PID data in the machine they were not persisting, despite an M500 command.

NOTE: Be sure to include the U parameter in your M303 command to immediately use the PID values. Then they can be saved with M500. Check immediately with M500 that the values are applied. Then, reboot and re-check with M503 that the values are stored. If PID values are failing to store to EEPROM that is a separate issue and will require a new bug report.

@jimberg98
Copy link

@thinkyhead Do you know why only half of max power is used? You set it to 255 and you only ever see 127. I fixed it in my code so it doesn't divide the power until it's in PID functional range. My bed has an AC heater so the power supply is barely taxed at 50 watts for the hot end. My bed and temp reach temp at about the same time now. I didn't change the way the hot end heats as far as power goes.

@thinkyhead
Copy link
Member

Are you saying this half power behavior is something that recent versions of Marlin PID do, but older versions of the PID code don't do?

@ManuelMcLure
Copy link
Contributor

When I looked at the code some time ago, I determined that @127 is actually full power - I don't know why the PWM power only goes from 0-127 but I expect it's something to do with signed values being needed for the PID algorithm.

@ManuelMcLure
Copy link
Contributor

It looks like

temp_hotend[e].soft_pwm_amount = (temp_hotend[e].celsius > temp_range[e].mintemp || is_preheating(e)) && temp_hotend[e].celsius < temp_range[e].maxtemp ? (int)get_pid_output_hotend(e) >> 1 : 0;
is where the value is limited to 0-127.

@jimberg98
Copy link

jimberg98 commented Aug 27, 2021

@ManuelMcLure, yep. I'm not sure why they do that. I took it out (the >> 1). My bed heats fine and stays at temp reliably.

@Thinkersbluff
Copy link
Author

I would recommend against using bang-bang heating for the bed because it tends to produce visible artifacts in the print.

This is very helpful advice. Thank you. Do you happen to have photos or a link that explains how to recognize this type of artifact? I often see folks asking what causes some types of print artifacts but I have never seen this particular response as a possibility.

@ManuelMcLure
Copy link
Contributor

ManuelMcLure commented Aug 27, 2021

Well, the comments in Configuration.h seem to imply that 0-127 is correct:

2805 // Incrementing this by 1 will double the software PWM frequency,
2806 // affecting heaters, and the fan if FAN_SOFT_PWM is enabled.
2807 // However, control resolution will be halved for each increment;
2808 // at zero value, there are 128 effective control positions.
2809 // :[0,1,2,3,4,5,6,7]
2810 #define SOFT_PWM_SCALE 0

Note the "128 effective control positions" bit.

However, I find

constexpr uint8_t pwm_mask = TERN0(SOFT_PWM_DITHER, _BV(SOFT_PWM_SCALE) - 1);
suspect, since most often SOFT_PWM_SCALE will be 0 and that means that pwm_mask will end up with a value of 1 << -1. C considers the results of << as undefined if either operand is negative.

EDIT: never mind - the - is outside the _BV() call.

@Sebazzz
Copy link
Contributor

Sebazzz commented Sep 4, 2021

For what it is worth, the PID values determined by the Marlin autotune for my E3D Hemera are:

M301 P51.5971 I9.3473 D71.2040

These values cause the temperature going up and down endlessly. However, the values below actually work and are stable:

M301 P34.9800 I3.8300 D79.9200

In both cases PID_FUNCTIONAL_RANGE is 25.

@avolkov
Copy link

avolkov commented Oct 5, 2021

@thinkyhead The command that saves to EEPROM that you gave fixes the issue -- M303 E-1 C8 S90 U

However, I don't think it is the same bug that Creality and BTT SKR 1.4 Turbo users were experiencing in the beginning of the thread.

I was experiencing the same issue with SKR 1.4 Turbo and Marlin 2.0.9.1, and I've tried fixes using ADC_LOWPASS_K_VALUE and PID_KI and PID_FUNCTIONAL_RANGE workaround settings. None of them worked.

I ran M501 and the values for bed Kp Ki Kd were all zero, even I defined them in Configuration.h; here's a sample output --

echo:  M301 P20.08 I1.30 D77.33
echo:  M304 P0.00 I0.00 D0.00

I think the bug here is that the values should be defaulting from the values defined in Configuration.h and not zeroed out. Also it seems it is currently not possible to load existing BEDPID values, it is only possible to write them using M303.

It could also be that SKR board doesn't honor the code that supposed to load the values.

I had the same issue with:

  • Stall sensitivity (M914) for X and Y axis
  • Steps per mm (M92)

I suspect this is what users keep filing when they realize they can't use values in Configuration.h when EEPROM is enabled -- #12468

I'm happy to open a ticket based on this.

It seems all the comments starting on 2021 Jun 21 are referring to EEPROM read issue in 2.0.9.1 rather than PIDTEMP bug with Creality/SKR

@ManuelMcLure
Copy link
Contributor

ManuelMcLure commented Oct 5, 2021

So, just to make sure we're on the same page, M501 will load existing values from EEPROM - it will completely ignore any values set in the firmware configuration files. You need to use M502 to copy the firmware values into RAM and M500 to save them back to EEPROM. Only then will M501 be able to load them back properly.
Apologies if this is something you're already aware of, but I didn't see any mention of M502 in your comment so I want to make absolutely sure that's not the cause of your issues.

@avolkov
Copy link

avolkov commented Oct 5, 2021

@ManuelMcLure Thank you. I'm moving from marlin 1.1.9 when I only sparsely used EEPROM and I didn't realize I needed to use M502 to load values from firmware.

This seems to be a common misconception, maybe better wording in Configuration.h could alleviate the problem. Referring to Marlin 2.0.9.1, M502 mentions resetting to 'factory defaults' but doesn't mention that defaults also need to be loaded with M502

/**
 * EEPROM
 *
 * Persistent storage to preserve configurable settings across reboots.
 *
 *   M500 - Store settings to EEPROM.
 *   M501 - Read settings from EEPROM. (i.e., Throw away unsaved changes)
 *   M502 - Revert settings to "factory" defaults. (Follow with M500 to init the EEPROM.)
 */

@ManuelMcLure
Copy link
Contributor

Yeah, it's a bit confusing. I always try to explain that there are three levels of configuration in Marlin - RAM, EEPROM, and firmware.
RAM is what the printer will use at runtime. EEPROM is used to initialize the RAM settings on printer boot or if you use M501.
Firmware settings will not override EEPROM settings unless:

  • You upgrade the firmware to a version that changes the EEPROM layout (i.e. adding or removing EEPROM variables) - if Marlin detects this case it will automatically copy the firmware values to EEPROM and RAM on first boot.
  • You explicitly use M502 to load the firmware values into RAM and then use M500 to store them, or use the equivalent LCD menus (Restore Defaults + Store Settings or Initialize EEPROM)

This is done because there's no easy way to detect whether a value was changed in the configuration files if the EEPROM layout didn't change. If you (for example) have updated your Z probe offset and stored it to EEPROM, and then load a new version of Marlin where you forgot to change the Z probe in the configuration, we don't want to override the EEPROM value and possibly cause a nozzle crash.

avolkov added a commit to avolkov/dolly-marlin that referenced this issue Oct 5, 2021
@github-actions
Copy link

github-actions bot commented Dec 5, 2021

This issue has had no activity in the last 60 days. Please add a reply if you want to keep this issue active, otherwise it will be automatically closed within 10 days.

@bergie5737
Copy link

bergie5737 commented Feb 1, 2022

Hi
I've recently upgraded my firmware to latest Marlin bugfix. I have a Wanhoa I3 clone. In my case my bed was still bang bang when my prints start to fail with "temperature error" as per the display. I replaced the thermocouple on my hotend as I thought that was the issue. I then added PID for the bed and now my bed stop 10C below target. When I do a PID tune, the bed temperature gets to the correct temperature. Any other way of heating stops 10C below target. I've increased PID functional range to 50, and it got worse. To me it appears the heater won't come on unless with 50 deg Celcius. :-).
I am adding this as it seems the issue is not resolved. The hot end is always stable for me.

@tombrazier
Copy link
Contributor

I might have some capacity to look into this bug. Is there anyone who is actively watching this and could do some testing? @Thinkersbluff maybe?

In the meantime, if anyone want to experiment with an alternative to PID, I have submitted a PR for model predictive control and I would like to hear back from others how it works for them. #23751

@Thinkersbluff
Copy link
Author

Thinkersbluff commented Mar 18, 2022

I might have some capacity to look into this bug. Is there anyone who is actively watching this and could do some testing? @Thinkersbluff maybe?

Yes, I am monitoring this actively and yes, I can make time to run specific tests. I am an anal-retentive retired Engineer with just enough knowledge and experience to follow instructions rigorously, but I am not a programmer nd I have no access to an electronics lab or to exotic test equipment like oscilloscopes.

I have an Ender3 and a CR6-SE.
I have modified both printers to direct-drive, with all-metal hotends and 32-bit BTT motherboards. (SKR E3 Turbo on the Ender, SKR CR6 on the CR6). I use Octoprint on a Pi3b+ to remotely monitor and control each of the two printers.

I do use VSCode/Platformio to compile Marlin 2.x for the Ender, and I can do the same for the CR-6. The display only works if I use the Community Firmware on the CR-6, though, so some controlled experiments may only be possible in “headless mode” on that one. The Ender3 uses the original rotary knob/LCD controller.

I have the stock Type1 thermistors and aftermarket Eigweit 40W heater elements on both printers.

The CR-6 is currently fitted with a Trianglelabs DragonHF hotend, and I find the aftermarket heater is a marginal fit in that E3 V6 clone heater block. I had to crank the retention screw as tight as I could, to hold the element firmly and I cannot improve on that, right now. I added thermal paste to improve thermal coupling between heater element and heat block on both printers. The Ender thermistor is a glass bead type, with no thermal paste. The CR6 thermistor is a cartridge type, with thermal paste and a grub screw.

Both printers do seem to be working, with a small ripple on the extruder temperature but no evidence of this original issue on either machine, “unfortunately”. One part of any worthwhile experimentation may need to be figuring out how to destabilize the PID control again, before testing a “fix”?

I do have a PT1000 Type 47 thermistor available for the CR6 but not for the Ender. The BTT ADC on both printers does seem particularly vulnerable to EMI, and I got frustrated with its inability to stabilize nozzle temperature with that PT1000 thermistor installed so I rolled-back the mod.

How can I be of help?

@tombrazier
Copy link
Contributor

Hmm. If the error no longer occurs, perhaps there is nothing to do. Are you using a different firmware to the one that generated the graphs above? If so an easy test would be to return to that firmware and see if the problem returns.

@Thinkersbluff
Copy link
Author

an easy test would be to return to that firmware and see if the problem returns.

I understand.
I have changed both hardware and firmware configurations, since last I reported this behaviour. Even going back to the previous firmware would not restore my printer to the actual condition it was in, when the above graphs were generated.

I originally posted my graphs and reports on the CR6 Community FIrmware GitHub, when I saw several CR6 Facebook community members chatting about the issue and blaming it on the CF. We later concluded that the bug was here, upstream of the CF fork, and there certainly do seem to have been a series of bugs over the years, with similar sounding issues.

I can see the temperature readings in the Octoprint Terminal "jumping" up and down by a couple of degrees at a time, from one sample to the next, which "dithering" I imagine is largely a matter of ADC noise overlaid on the "actual" thermal reading digitized with +/- one digit uncertainty at the ADC resolution. I do not understand enough of the system design to know how many significant figures the firmware really has to work with, but there must come a point beyond which it is futile to try to derive more accurate readings from the available data...

If your PR is designed to improve the ability of Marlin to cope with "noisy" data, is there any value to a log of thermal readings from the terminal on my system? (i.e. do you have a simulation setup to compare the data at the output of various points in Marlin, with and without your PR?)

@tombrazier
Copy link
Contributor

This is a really fraught area with so many variables. Different sources of error can easily be conflated and what might be a software issue on one machine could look similar to a hardware error on another.

Some of what you describe sounds like #22893. After a long conversation, that issue resulted in two PRs from me which have just been merged, #23871 and #23867. Both could have an effect on the apparent quantization of ADC values. One activates 12 bit ADC (which was supposed to be the norm for 32 bit ARM processors but owing to a subtlety was not) and the other allows 16 times oversampling for when 12 bit ADC is used. On the other hand, maybe the behaviour you mention has a quite different source!

If I had someone who could replicate the 10 degree offset we might be able to work together to find its source.

My MPC PR should do better than PID with noisy data. However I do need real systems to test it against because I have already established that it does well against simulated hotends.

How does the CF work? Does it merge in upstream Marlin changes? Is it similar enough that upstream PRs could be merged easily?

@Thinkersbluff
Copy link
Author

How does the CF work? Does it merge in upstream Marlin changes? Is it similar enough that upstream PRs could be merged easily?

I believe there is a PR being actively worked now, between @thinkyhead and @Sebazzz to merge the CF fork back into the mainstream. That was the goal of the CF project from the beginning. No idea how long it might take to complete that merge though… Although it may leave support for the stock TFT display on its own branch, most of the CF fork is still Marlin and there is an unreleased extui branch on the CF fork that @Sebazzz has been updating with Marlin PRs.

There may be other CF users who can still reproduce the problem. I believe at least one of the original Facebook “gang” commented on that issue and may still be monitoring it for updates. Maybe you can find a useful partner by recruiting on Community Firmware GitHub issue#248

@Thinkersbluff
Copy link
Author

Some of what you describe sounds like #22893.

Indeed, I believe I also had that problem when I briefly experimented with 2.0.9.2. I did not have the problem described here, but I could not get Filament Load/Unload to work because the firmware kept waiting for the nozzle temperature to stabilize. In the end, I swapped the PT1000 out, and the problematic behaviour “disappeared”.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators May 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests