Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thermal runaway audio-visual alarm #1509

Closed
nophead opened this issue Feb 16, 2015 · 43 comments
Closed

Thermal runaway audio-visual alarm #1509

nophead opened this issue Feb 16, 2015 · 43 comments
Labels
Needs: Work More work is needed T: Feature Request Features requested by users.

Comments

@nophead
Copy link
Contributor

nophead commented Feb 16, 2015

While answering a forum question I came across this line:

if (temperature >= (target_temperature - hysteresis_degc))

if (temperature >= (target_temperature - hysteresis_degc))
  //reset the timer

It seems to me is should be: -

if (temperature >= (target_temperature - hysteresis_degc) && 
    temperature <= (target_temperature + hysteresis_degc))
@alexborro
Copy link
Contributor

@nophead , the idea is to keep the timer counting IF the current temperature is bellow the target. I mean, if the system takes longer than "period_seconds" to make the temperature reach the target, probably something is wrong and the system should be halted.

In your proposal, the timer will increment even if the temperature is higher than hysteresis, which makes no sense to me. We are trying to catch a situation where the heater is ON for a long time without hitting the target. If the temperature is above target+hysteresis, the heater should be OFF - maybe just a misconfigured PID and presents no danger to the system.

If I misunderstood your proposal, let me know.

Cheers.

Alex.

@nophead
Copy link
Contributor Author

nophead commented Feb 17, 2015

If it is above the target for a long time then that implies the heater is stuck on, which is classic thermal runaway in my book.

@nophead
Copy link
Contributor Author

nophead commented Feb 17, 2015

Also too low could be thermistor fallen off but it could also be fan cooling hot end too much. In that case thermal runaway is a confusing error message.

@alexborro
Copy link
Contributor

Chris, if the temperature is above the target and the system can still read the temperature, it will take measures to turn the heater off.
In Thermal Runaway Protection we are trying to take care of some hardware issues and the most common issue is a thermistor coming off place. In such situation the measured temperature will be below the real one so the heater will take longer (or never) to reach the target.

About the cooling fan, the user need to calibrate correctly the system. In my experiences, even a good cooling fan turned on at max power will not prevent a 40W heater to reach the target in a few seconds.

Bear in mind this is not a feature to prevent 100% of the failure cases, just one more redundancy trying to prevent damage.

Cheers.

Alex.

@galexander1
Copy link
Contributor

@nophead @alexborro
I think the question is whether any temperature sensors will provide a
high reading on failure? I see a few different failure modes...

  • pop out of heater block: it will read room temp (~20C) -- FAIL LOW
  • If a thermistor disconnects (electrical open), the 4.7k pullup will
    pull it to 5V (or 3.3V or whatever), which will make it into
    analog2temp() as raw==0xffff or similar, and it will return the last
    value in the table which is usually 0C -- FAIL LOW
  • If the sensor shorts, I think it will see 0V, analog2temp() will see
    raw==0, and it will return the first entry in the table (typically
    high max) -- FAIL HIGH

That's with your typical NTC thermistor. Disconnecting seems more common
than shorting, but shorting isn't impossible. And if you used a PTC
thermistor (none supported yet?), the short and disconnect scenarios
would be reversed. I don't know anything about thermocouples... And of
course my analysis of NTC thermistors could be wrong.

So I think @nophead's suggestion is a good idea. Sustained high readings
can indicate a thermistor failure that should shut down the robot.

It would be kind of neat if analog2temp() would set a specific failure
value (0C/1000C?) instead of just using the first/last entries in the
table, to make sure that off-scale readings don't look like max-scale
readings.

@galexander1
Copy link
Contributor

@alexborro said:

Chris, if the temperature is above the target and the system can still
read the temperature, it will take measures to turn the heater off.

Oh. Right. :) Context is everything! Please disregard my previous
message.

I think off-scale high is harmless because if the firmware can turn off
the heater, it already will. And if it can't, well, what are we gonna
do?!

@nophead
Copy link
Contributor Author

nophead commented Feb 17, 2015

When I said "heater stuck on2 I meant when the MOSFET is shorted due to ESD strike punching through the gate oxide. Yes the firmware will turn it off but if the temperature is still rising when the power is off that is thermal runaway and the user should be alerted even if it can't be shut down.

@galexander1
Copy link
Contributor

If I'm sitting there and that happens, I'll probably smell it long before
I happen to look at the G-code console. :)

And if I'm not sitting there....bye bye hotend :(

On Tue, Feb 17, 2015 at 06:19:22AM -0800, Chris wrote:

When I said "heater stuck on2 I meant when the MOSFET is shorted due to ESD strike punching through the gate oxide. Yes the firmware will turn it off but if the temperature is still rising when the power is off that is thermal runaway and the user should be alerted even if it can't be shut down.

@nophead
Copy link
Contributor Author

nophead commented Feb 17, 2015

The noticeable thing would be the machine stops printing before the hot end starts to smoke. If the PSU is controllable then it could be switched off.

You seem happy to have a "thermal runaway protection" that ignores the most classic interpretation of thermal runaway i.e. getting too hot, for the sake of not adding a second clause to an expression.

It seems wrong to me but as I don't use it myself I don't really care. I use beefy MOSFETs and since I started cementing in my thermistors I have never had a case of thermal runaway. I also don't use 40W heaters, so the chance of a fire is small enough for me to not worry about it.

@galexander1
Copy link
Contributor

galexander1 commented Feb 17, 2015 via email

@nophead
Copy link
Contributor Author

nophead commented Feb 17, 2015

If you reduce the temperature setting it won't immediately error because it waits for the time specified to elapse. It is just the same situation as if you increase the temperature setting now. I really needs to go back to state 1 when the set point changes and move to state 2 when it crosses the set point.

Most failures are caused by using tape or silicone to fix thermistor. It should be cemented or screwed in. If the connector fails you should get a MINTEMP error and if it shorts you get MAXTEMP.

The most common type of failure with a MOSFET is to go short circuit though ESD damage, un-suppressed transients or getting too hot due to not being switched fully on. Looks like one here: http://forums.reprap.org/read.php?13,471647,471647#msg-471647. Note this is a cheap Chinese Melzi. The ones I supply have very sturdy MOSFETs and I have never know one fail.

@daid
Copy link
Contributor

daid commented Feb 17, 2015

Problem with this whole protection is that it doesn't really protect against all possible failure cases. Not even after the changes proposed by nophead.

If you want to do it right, you need to analyze every possible "single failure" scenario. And be sure to have protection against that.

One of the cases where all protection falls short is where the heater moves out of the hotend. This can happen in rare cases where the heater is not properly secured, but then poses a high risk, as this is only detected during heating up, not during actual printing.

@galexander1
Copy link
Contributor

galexander1 commented Feb 17, 2015 via email

@daid
Copy link
Contributor

daid commented Feb 17, 2015

The 25W heater found in the UM2 and the 40W heater in the UMO can both ignite PLA. Could not manage to ignite ABS with the UM2 heater.

But I don't think it will trigger the protection implemented here, as it will stay stay stuck in state 1.

@galexander1
Copy link
Contributor

galexander1 commented Feb 17, 2015 via email

@alexborro
Copy link
Contributor

@nophead , your point is valid and good. A shorted MOSFET is not a common issue but happens - actually I had some in my life. And this proposed change will warn about it and, if the board is controlling the PS, shut the system down - I usually have a SSR powering up my systems, so it's easy to shut the whole system down.

@daid , as I said before, this is one more redundancy trying to avoid any danger. There is not even a single 100%-safe-system in whole world; otherwise we will have no more airplane crashes. No matter how many safety devices you have protecting a system, there will be always a way to crash..
I just don't know why people blame protection devices/routines just because they cannot save the system in 100% of the cases.. I'm happy saving just one house from burning up in flames instead of keeping years planning "The Great Safety Device" that never come out from the paper.

@galexander1 , we can add a time limit for the initial heating as well, that is pretty easy indeed. But people will mess it around. They get confused in the way it is today, I wonder if we make it a little more complex.. they will just turn it off. People - including me - watch a least the first layer and then leave the printer alone for hours.. so it is not a big deal.

Cheers.

Alex.

@daid
Copy link
Contributor

daid commented Feb 18, 2015

@alexborro You cannot have 100% safety. But, in some industries (I come from traffic light systems & safety) you get damn close. What you do is a "cause and effect analyse", to see what happens if a single component fails. If the failure causes a critical problem (in this case, burning printing) you should have some detection mechanism in place.
Then, on all the single errors you do not detect, you check what happens if you combine those with other single errors that you don't detect. If a combination there also causes a critical problem, you once again add some form of protection.

However, due to all the hardware variations in RepRap land, this is difficult.

And, a burned down house is a critical failure. Failed print is a minor failure. (And there are a few reported cases of burned down printers, and a reported case of a burned down house which quite possibly was the printer)

@alexborro
Copy link
Contributor

@daid , you have got my point. Many users assemble their own machine and will not spend more money on safety devices like smoke detectors, redundant thermometers, etc. On the other side, software is free, they just enable a feature. Of course it cannot protect all cases, but it's free and can protect some cases.
A friend of mine had a room in his house burned due a thermistor coming off place. For God sake just the stuffs in the room got burned - the houses in my country are made of ceramic bricks, not wood like US. But it was a considerable damage and motivated me doing this feature - it could avoid such pain.

If you guys have new ideas to improve it, let me know. Soon I will implement the change proposed by @nophead .

Cheers.

Alex.

@daid
Copy link
Contributor

daid commented Feb 18, 2015

@alexborro The amount of machines sold with Marlin on it (as full machine) dwarfs the amount of self build machines.

For the UM2, I look at the pidoutput, and check if that's on full power for a long time. If it's on for a long time, and do not see a temperature increase, then something must be wrong. This catches a lot of cases.

@msutas
Copy link
Contributor

msutas commented Feb 18, 2015

I prefer the thermal runaway to be as it is right now. My printers have a 750 watt bed heater and the thermistor placed on the side of the bed. It overshoots the target temperature by 10-15 degrees in bangbang control. This means for the bed in order not to have a false thermal runaway alert, I need to set a wide gap for thermal runaway which increases the risk by late alerting when the thermistoris loose.

The controller stops cooling if the temperature is above the target. When there is a problem with the mosfet throwing a thermal runaway alert, it would not stop the heater and there will not be any benefit if the robot was left alone during print. If the operator is frequently checking the print, the mosfet fault should be noticed anyway.

If the change is agreed on, I belive it would be benefical to have separate limits for thermal upward and downward runaway on configuration.

@nophead
Copy link
Contributor Author

nophead commented Feb 18, 2015

I don't think it checks the bed for runaway, just the extruder heaters.

If you have bed that is not self limiting you should put a thermal cutout
in series with it.

On 18 February 2015 at 13:05, Mehmet Sutas [email protected] wrote:

I prefer the thermal runaway to be as it is right now. My printers have a
750 watt bed heater and the thermistor placed on the side of the bed. It
overshoots the target temperature by 10-15 degrees in bangbang control.
This means for the bed in order not to have a false thermal runaway alert,
I need to set a wide gap for thermal runaway which increases the risk by
late alerting when the thermistoris loose.

The controller stops cooling if the temperature is above the target. When
there is a problem with the mosfet throwing a thermal runaway alert, it
would not stop the heater and there will not be any benefit if the robot
was left alone during print. If the operator is frequently checking the
print, the mosfet fault should be noticed anyway.

If the change is agreed on, I belive it would be benefical to have
separate limits for thermal upward and downward runaway on configuration.

Reply to this email directly or view it on GitHub
#1509 (comment)
.

@alexborro
Copy link
Contributor

@msutas Bear in mind there is also a period of time the temperature needs to be over the threshold.
I usually set 40 seconds in my printers. I think is quite difficult your bed to stay 10ºC over the target for 40 seconds.. check it out.. I usually set my bed to 110ºC for ABS.. and if I turn it off, it drops to 100ºC within 20 seconds.

Cheers.

Alex.

@galexander1
Copy link
Contributor

galexander1 commented Feb 18, 2015 via email

@thinkyhead
Copy link
Member

Who's for adding an audible "fire alarm" to the LCD code that will go off in cases of bad thermal runaway?

@CONSULitAS
Copy link
Contributor

👍 🎱

@avluis
Copy link
Contributor

avluis commented Feb 21, 2015

@thinkyhead Was just thinking about this, use the buzzer (piezo speaker, whatever the LCD has) as the alarm when thermal runaway protection has been triggered.

@ntoff
Copy link

ntoff commented Feb 21, 2015

Who's for adding an audible "fire alarm" to the LCD code that will go off in cases of bad thermal runaway?

Why not just an alarm that goes off whenever there's any kind of error at all?

Also for the high temperature detection like the mosfet is stuck on, what good will just detecting that do? If the mosfet is stuck on then there will be no way for the printer to do anything other than alert the user through a message so detecting it would mean very little. The only situation I can think of would be to wire up an ATX PSU and have the RAMPS board able to cut power to it so it just shuts the printer down entirely.

@thinkyhead
Copy link
Member

@ntoff Something like a "dead man's switch" …

@clefranc
Copy link
Contributor

@thinkyhead Yes! Trigger the buzzer please...
@ntoff Yes! Shutdown the PSU too...

http://wavs.unclebubby.com/wav/TREK/Computer/audesarm.wav

@thinkyhead thinkyhead added the T: Feature Request Features requested by users. label Feb 24, 2015
@thinkyhead thinkyhead added the Needs: Work More work is needed label Feb 24, 2015
@thinkyhead
Copy link
Member

For users lacking an LCD controller, perhaps we can blink the status LED on the electronics board in some attention-grabbing way.

@thinkyhead thinkyhead changed the title Thermal runaway protection looks wrong More intelligent thermal runaway protection Feb 24, 2015
@avluis
Copy link
Contributor

avluis commented Feb 24, 2015

@thinkyhead LED sounds good for those without an LCD, but in most cases I believe the electronics would be mounted on the frame, which for some, makes the LED not too visible.
How about if the user has fans, then blasting them to full then low and repeat - moving the X & Y axis 5 - 10mm in both directions and repeat, etc.
Something that would really make someone PANIC!
Will definately get their attention and will know right away something is wrong.

@thinkyhead
Copy link
Member

Haha. We could use motor ringing vibrations to play a tune on one of the axis motors... The Imperial March from Star Wars, I suppose...

@daid
Copy link
Contributor

daid commented Feb 24, 2015

Would you really want to risk shaking a machine apart and people reporting errors like "my machine is shaking, your code is broken!1!"?

@thinkyhead
Copy link
Member

@daid Haha, no, truly! I should be more clear that I don't favor abusing the motors in this way. We should signal in all the appropriate ways, though - audible if possible, a message to the LCD, and a distinct error message to the Serial Out so the host can respond.

@avluis
Copy link
Contributor

avluis commented Feb 24, 2015

@thinkyhead Actually, can hosts play audio if requested via Serial?
This could be the way to go for those with a computer system connected to their printer. I know Repetier will play audio where requested on the G-Code but none that I know of that can reproduce sound on demand from the printer - this could be a good way to have an alarm~

@thinkyhead
Copy link
Member

@avluis I expect that, more likely, we will expand the Serial Protocol (and document it) to include a distinct message for a thermal shutdown situation, with a recommendation to alert users with an audible signal, and then hosts can choose the sound they prefer.

@ntoff
Copy link

ntoff commented Feb 26, 2015

@thinkyhead My RC car used to play a tune using the motor whenever it started up. You'd count the beeps and things. Surely some high pitched noises wouldn't be that bad? Or aren't the Atmel / driver chips fast enough for higher frequencies?

@ZetaPhoenix
Copy link
Contributor

ZetaPhoenix commented Feb 26, 2015 via email

@thinkyhead
Copy link
Member

@ZetaPhoenix I guess we have to come up with a consensus on how much rotation would be acceptable, and which axis to favor. Users flash their own firmware and choose their own options, so presumably they will be informed about the potential for this to occur. And then, well, it would be "cute" to include this as a general feature - an alternative version of the "beep" command for units with no controller speaker.

@ntoff
Copy link

ntoff commented Feb 28, 2015

I was only thinking of it in terms of some kind of emergency alert thing, rather than a beeper replacement. I would think the Z axis motors would be the best choice since they're almost guaranteed to be screwed to the frame which would amplify the sounds they made wouldn't it? Plus if you were printing and watching it on a web cam and saw the Z axis had moved up 10mm away from the print before playing the alert sound, it would also be a visual indicator that something had gone wrong.

You mentioned blinking LED's but some printers are enclosed with the electronics not so visible and I'm so used to seeing the LED's blink because of PID, I'd probably just ignore a blinking LED thinking it was just the PID doing its job.

@thinkyhead
Copy link
Member

@ntoff Well, any blinking would be some obvious regular flashing pattern, 3-4 Hz.

@boelle boelle modified the milestones: Feature Requests Round 12, Feature Requests Round 13 Apr 1, 2015
@thinkyhead thinkyhead changed the title More intelligent thermal runaway protection Thermal runaway audio-visual alarm May 9, 2015
@boelle boelle modified the milestones: Feature Requests Round 3, Feature Requests Round 8_ Jun 29, 2015
@boelle
Copy link
Contributor

boelle commented Jun 29, 2015

I will close this one.... there is no 100% way to get arround this problem
example:

i start an 24 hour or more print (yes i have done it a few times), even though the LCD panel (which i don't have) starts to flash and blink and beep like hell, there is no way that will wake me up across the flat through 2 closed doors anyway.

the only "close to" 100% is to fit a thermal fuse that breaks the connection at say 247 degrees where the PEEK would start to soften. or even lower as i rarely go above 190 anyway.

another reason is that it will just add to the complexity and what we have is enough for most cases

@boelle boelle closed this as completed Jun 29, 2015
@boelle boelle modified the milestone: Feature Requests Round 3 Jun 29, 2015
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Needs: Work More work is needed T: Feature Request Features requested by users.
Projects
None yet
Development

No branches or pull requests