Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[automation] memory leak caused by bad rule in UI #2200

Closed
morph166955 opened this issue Feb 15, 2021 · 47 comments · Fixed by #2212
Closed

[automation] memory leak caused by bad rule in UI #2200

morph166955 opened this issue Feb 15, 2021 · 47 comments · Fixed by #2212

Comments

@morph166955
Copy link

Tested against S2207 but reported on https://community.openhab.org/t/openhab-3-runs-out-of-memory-java-heap-space-errors-cpu-100-after-a-few-hours/110924/105 as happening between 3.0 and 3.0.1

As identified by @mhilbush a rule similar to the rule below, but created on the new rules UI. This is 100% reproducible. I created an Ubuntu 20.04 VM (on ESXi) with 4GB ram. I only installed the systeminfo binding to monitor system use over time. I removed the rrd4j persistence as it was thought to be causing the problem at once point (it wasn't). There is nothing on this system other than a single thing created as per the config below.

rule "Test One per Second"
when
    Time cron "* * * * * ? *"
then
    xxxxxxx
    var Number n = 1
end

Causes the following:

2021-02-15 03:25:52.728 [DEBUG] [e.automation.internal.RuleEngineImpl] - The rule 'aeefe05726' is executed.
2021-02-15 03:25:53.673 [DEBUG] [e.automation.internal.RuleEngineImpl] - The trigger '1' of rule 'aeefe05726' is triggered.
2021-02-15 03:25:53.735 [ERROR] [internal.handler.ScriptActionHandler] - Script execution of rule with UID 'aeefe05726' failed:      ___ xxxxxxx
    var Number n = 1

   The method or field xxxxxxx is undefined; line 1, column 4, length 7
javax.script.ScriptException:      ___ xxxxxxx
    var Number n = 1

   The method or field xxxxxxx is undefined; line 1, column 4, length 7
        at org.openhab.core.model.script.runtime.internal.engine.DSLScriptEngine.eval(DSLScriptEngine.java:127) ~[?:?]
        at org.openhab.core.automation.module.script.internal.handler.ScriptActionHandler.lambda$0(ScriptActionHandler.java:62) [bundleFile:?]
        at java.util.Optional.ifPresent(Optional.java:183) [?:?]
        at org.openhab.core.automation.module.script.internal.handler.ScriptActionHandler.execute(ScriptActionHandler.java:59) [bundleFile:?]
        at org.openhab.core.automation.internal.RuleEngineImpl.executeActions(RuleEngineImpl.java:1179) [bundleFile:?]
        at org.openhab.core.automation.internal.RuleEngineImpl.runRule(RuleEngineImpl.java:987) [bundleFile:?]
        at org.openhab.core.automation.internal.TriggerHandlerCallbackImpl$TriggerData.run(TriggerHandlerCallbackImpl.java:89) [bundleFile:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]

Over time, this failure seems to cause a memory leak leading to an eventual system crash due to OutOfMemory

2021-02-15 02:22:34.305 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.307 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.307 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.307 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.306 [WARN ] [e.jetty.util.thread.QueuedThreadPool] -
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.308 [ERROR] [org.apache.felix.fileinstall        ] - In main loop, we have serious trouble
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:26:05.103 [WARN ] [e.jetty.util.thread.QueuedThreadPool] -
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:26:05.103 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space

systeminfo thing and items created as

openhab@oh3-test-vm:~$ cat /etc/openhab/things/systeminfo.things
Thing systeminfo:computer:test [interval_high=1, interval_medium=1]

openhab@oh3-test-vm:~$ cat /etc/openhab/items/systeminfo.items
String CPU_Name                   "Name"                <none>           { channel="systeminfo:computer:test:cpu#name" }
String CPU_Description            "Description"         <none>           { channel="systeminfo:computer:test:cpu#description" }
Number CPU_Load1                  "Load (1 min)"        <none>           { channel="systeminfo:computer:test:cpu#load1" }
Number CPU_Load5                  "Load (5 min)"        <none>           { channel="systeminfo:computer:test:cpu#load5" }
Number CPU_Load15                 "Load (15 min)"       <none>           { channel="systeminfo:computer:test:cpu#load15" }
Number CPU_Threads                "Threads"             <none>           { channel="systeminfo:computer:test:cpu#threads" }
Number CPU_Uptime                 "Uptime"              <time>           { channel="systeminfo:computer:test:cpu#uptime" }
Number Memory_Available           "Available"           <none>           { channel="systeminfo:computer:test:memory#available" }
Number Memory_Used                "Used"                <none>           { channel="systeminfo:computer:test:memory#used" }
Number Memory_Total               "Total"               <none>           { channel="systeminfo:computer:test:memory#total" }
Number Memory_Available_Percent   "Available (%)"       <none>           { channel="systeminfo:computer:test:memory#availablePercent" }
Number Memory_Used_Percent        "Used (%)"            <none>           { channel="systeminfo:computer:test:memory#usedPercent" }

I attempted to fix this in #2182 thinking that this was because of the threadpools but that was not successful. I also attempted to back out #2057 and #1952 to see if either of those were the problem. This was also fruitless. I am at a loss on where to move further and can not identify where the leak is.

@openhab-bot
Copy link
Collaborator

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/openhab-3-runs-out-of-memory-java-heap-space-errors-cpu-100-after-a-few-hours/110924/127

@mhilbush
Copy link
Contributor

Thanks for opening this.

Reference my notes here.
https://community.openhab.org/t/openhab-3-runs-out-of-memory-java-heap-space-errors-cpu-100-after-a-few-hours/110924/104

Here.
https://community.openhab.org/t/openhab-3-runs-out-of-memory-java-heap-space-errors-cpu-100-after-a-few-hours/110924/116

And here.
https://community.openhab.org/t/openhab-3-runs-out-of-memory-java-heap-space-errors-cpu-100-after-a-few-hours/110924/128

I always thought rules files containing syntax errors were ignored by the rule engine. But I see even in 2.5 that my simple rule executes every second. Of course, because 2.5 only has text-based DSL rules, it doesn't cause a memory consumption issue. That only happens with the UI version of DSL rules.

@morph166955
Copy link
Author

@mhilbush Was able to load the test jar I posted and had some interesting results. Apparently restarting of org.openhab.core.model.script.runtime caused the memory to free up. This at least gives us a path to look at for where the issue could be located.

1e646e446f0b8098142ef4e9b99d9b768f718529

@mhilbush
Copy link
Contributor

@kaikreuzer This can now be easily reproduced, so hopefully we can get to the root cause.

@mhilbush
Copy link
Contributor

Apparently restarting of org.openhab.core.model.script.runtime caused the memory to free up.

This makes complete sense to me as the resources associated with the bundle were freed up when the original bundle was uninstalled and the new one installed.

@mhilbush
Copy link
Contributor

Seems to me that something is different in how resources are managed after a failed rule execution between the text DSL rules and the UI DSL rules.

@morph166955
Copy link
Author

So the offending line is definitely

The call to scriptEngine.newScriptFromString() is failing and throwing an exception when the bad script is in place. The question that I have is why we aren't handling that properly (or what's not releasing). This is already in a try/catch so

should be throwing the proper exception back to ScriptActionHandler (which it looks like it's doing based on the error above). I'm wondering if the leak is in ScriptActionHandler and for some reason it's not releasing the memory for something when it gets an exception.

@mhilbush
Copy link
Contributor

Also, I would assume the issue has nothing to do with the change to cache the parsed script, since that commit was excluded from your test jar. Correct?

@morph166955
Copy link
Author

Correct. That said, I believe this would have happened before that as well since we've always called newScriptFromString.

@morph166955
Copy link
Author

Just to keep some notes as I go, I threw some debugs in and the exception is definitely being thrown by the line below. It make sense because it's a validation error.

throw new ScriptParsingException("Failed to parse expression (due to managed ValidationError/s)",

So the question at this point is, should we be freeing up anything in the else on L150 before we throw the error in the function below:

private XExpression parseScriptIntoXTextEObject(String scriptAsString) throws ScriptParsingException {
XtextResourceSet resourceSet = getResourceSet();
Resource resource = resourceSet.createResource(computeUnusedUri(resourceSet)); // IS-A XtextResource
try {
resource.load(new StringInputStream(scriptAsString, StandardCharsets.UTF_8.name()),
resourceSet.getLoadOptions());
} catch (IOException e) {
throw new ScriptParsingException(
"Unexpected IOException; from close() of a String-based ByteArrayInputStream, no real I/O; how is that possible???",
scriptAsString, e);
}
List<Diagnostic> errors = resource.getErrors();
if (!errors.isEmpty()) {
throw new ScriptParsingException("Failed to parse expression (due to managed SyntaxError/s)",
scriptAsString).addDiagnosticErrors(errors);
}
EList<EObject> contents = resource.getContents();
if (!contents.isEmpty()) {
Iterable<Issue> validationErrors = getValidationErrors(contents.get(0));
if (!validationErrors.iterator().hasNext()) {
return (XExpression) contents.get(0);
} else {
throw new ScriptParsingException("Failed to parse expression (due to managed ValidationError/s)",
scriptAsString).addValidationIssues(validationErrors);
}
} else {
return null;
}
}

@mhilbush
Copy link
Contributor

Interesting. But there's one thing I don't understand...

The exception message that's logged doesn't say what's in the ScriptParsingException message from line 150. And based on what's here, it seems like that's what the exception message should say.

@morph166955
Copy link
Author

I was just looking at that. The actual error log message is fired at:

I believe that is because of the way https://github.com/openhab/openhab-core/blob/main/bundles/org.openhab.core.model.script/src/org/openhab/core/model/script/engine/ScriptParsingException.java is created

@mhilbush
Copy link
Contributor

Yeah, I see.

So is parseScriptIntoXTextEObject only called when the rule is a UI DSL rule?

@morph166955
Copy link
Author

From what I can tell yes. When I grep for it I only get:

bundles/org.openhab.core.model.script.runtime/src/org/openhab/core/model/script/runtime/internal/engine/ScriptEngineImpl.java:111: return newScriptFromXExpression(parseScriptIntoXTextEObject(scriptAsString));
bundles/org.openhab.core.model.script.runtime/src/org/openhab/core/model/script/runtime/internal/engine/ScriptEngineImpl.java:129: private XExpression parseScriptIntoXTextEObject(String scriptAsString) throws ScriptParsingException {

@morph166955
Copy link
Author

Ok so more info. I got to looking at this to understand the logic of why we're doing a whole bunch of stuff to one but not the other.

if (script.stripLeading().startsWith(DSLScriptContextProvider.CONTEXT_IDENTIFIER)) {

I added logger.debug("DSLScriptEngine {}",script.stripLeading()); right before that line

For a text based rules file we get:

2021-02-15 19:40:12.645 [DEBUG] [time.internal.engine.DSLScriptEngine] - DSLScriptEngine // context: test2-1
yyyyyyyy
var Number m = 1

For a UI based rule we get:

2021-02-15 19:40:13.090 [DEBUG] [time.internal.engine.DSLScriptEngine] - DSLScriptEngine xxxxxxx
var Number n = 1

@morph166955
Copy link
Author

morph166955 commented Feb 15, 2021

Apparently I should have grepped for this instead...

bundles/org.openhab.core.model.script.runtime/src/org/openhab/core/model/script/runtime/internal/engine/ScriptEngineImpl.java:111: return newScriptFromXExpression(parseScriptIntoXTextEObject(scriptAsString));
bundles/org.openhab.core.model.script.runtime/src/org/openhab/core/model/script/runtime/internal/engine/ScriptEngineImpl.java:115: public Script newScriptFromXExpression(XExpression expression) {
bundles/org.openhab.core.model.script.runtime/src/org/openhab/core/model/script/runtime/internal/engine/DSLScriptEngine.java:108: s = scriptEngine.newScriptFromXExpression(xExpression);
bundles/org.openhab.core.model.rule.runtime/src/org/openhab/core/model/rule/runtime/internal/RuleContextHelper.java:64: : scriptEngine.newScriptFromXExpression(var.getRight()).execute();

@morph166955
Copy link
Author

Text versus UI based evalutation and errors thrown

UI:

2021-02-15 19:54:47.355 [ERROR] [internal.handler.ScriptActionHandler] - Script execution of rule with UID 'aeefe05726' failed:      ___ xxxxxxx
    var Number n = 1

   The method or field xxxxxxx is undefined; line 1, column 4, length 7
javax.script.ScriptException:      ___ xxxxxxx
    var Number n = 1

   The method or field xxxxxxx is undefined; line 1, column 4, length 7
        at org.openhab.core.model.script.runtime.internal.engine.DSLScriptEngine.eval(DSLScriptEngine.java:128) ~[?:?]
        at org.openhab.core.automation.module.script.internal.handler.ScriptActionHandler.lambda$0(ScriptActionHandler.java:70) [bundleFile:?]
        at java.util.Optional.ifPresent(Optional.java:183) [?:?]
        at org.openhab.core.automation.module.script.internal.handler.ScriptActionHandler.execute(ScriptActionHandler.java:65) [bundleFile:?]
        at org.openhab.core.automation.internal.RuleEngineImpl.executeActions(RuleEngineImpl.java:1179) [bundleFile:?]
        at org.openhab.core.automation.internal.RuleEngineImpl.runRule(RuleEngineImpl.java:987) [bundleFile:?]
        at org.openhab.core.automation.internal.TriggerHandlerCallbackImpl$TriggerData.run(TriggerHandlerCallbackImpl.java:89) [bundleFile:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]

Text:

2021-02-15 19:54:48.149 [ERROR] [internal.handler.ScriptActionHandler] - Script execution of rule with UID 'test2-1' failed: The name 'yyyyyyyy' cannot be resolved to an item or type; line 5, column 5, length 8 in test2
javax.script.ScriptException: The name 'yyyyyyyy' cannot be resolved to an item or type; line 5, column 5, length 8 in test2
        at org.openhab.core.model.script.runtime.internal.engine.DSLScriptEngine.eval(DSLScriptEngine.java:128) ~[?:?]
        at org.openhab.core.automation.module.script.internal.handler.ScriptActionHandler.lambda$0(ScriptActionHandler.java:70) [bundleFile:?]
        at java.util.Optional.ifPresent(Optional.java:183) [?:?]
        at org.openhab.core.automation.module.script.internal.handler.ScriptActionHandler.execute(ScriptActionHandler.java:65) [bundleFile:?]
        at org.openhab.core.automation.internal.RuleEngineImpl.executeActions(RuleEngineImpl.java:1179) [bundleFile:?]
        at org.openhab.core.automation.internal.RuleEngineImpl.runRule(RuleEngineImpl.java:987) [bundleFile:?]
        at org.openhab.core.automation.internal.TriggerHandlerCallbackImpl$TriggerData.run(TriggerHandlerCallbackImpl.java:89) [bundleFile:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]

@morph166955
Copy link
Author

And the reason for that is because...

@Override
public Script newScriptFromString(String scriptAsString) throws ScriptParsingException {
return newScriptFromXExpression(parseScriptIntoXTextEObject(scriptAsString));
}
@Override
public Script newScriptFromXExpression(XExpression expression) {
ScriptImpl script = ScriptStandaloneSetup.getInjector().getInstance(ScriptImpl.class);
script.setXExpression(expression);
return script;
}

Text based rules call newScriptFromXExpression() where UI calls newScriptFromString()

@morph166955
Copy link
Author

So I think I see the problem, I'm just not sure how to fix it. When you create a rule in the UI no context is created for it. I'm going to keep going through the code to see how contexts are created when text rules are added to see if I can add something to create a context if one doesn't exist. I "think" that the fact that no context exists is causing the parsing attempt to be stored somewhere and every time the rule fires it attempts to parse and it just spins out of control eventually. Option B is to figure out how to dispose of it properly when it fails.

@morph166955
Copy link
Author

And I've made it to the end of the issue... maybe.

The offending line...

Every time it kicks, it creates a new context. Because the rules UI doesn't create a context, and because the generation fails, these go into oblivion for some reason. I've got a jar running right now to see if this is going to spin out of control but right now the system is just hovering inside of a few MB. Basically the fix was to set context to null at the top and then fire new SimpleScriptContext(); if context is null in two places. I also, probably overkill, set context = null before the script throws an exception just in the event something created it and it needs to be destroyed. If this stays stable, I'll make a clean jar without all my debug chaos in it and publish.

@morph166955
Copy link
Author

So there appears to be two leaks. Notice below there are periods where the memory usage stays basically stagnant. Then randomly it will jump up 5-10MB. Then calm. Then jump.

2021-02-15 21:44:26.234 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 841 to 842
2021-02-15 21:44:28.234 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 842 to 843
2021-02-15 21:44:30.236 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 843 to 844
2021-02-15 21:44:34.242 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 844 to 845
2021-02-15 21:44:38.247 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 845 to 846
2021-02-15 21:44:41.250 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 846 to 847
2021-02-15 21:44:46.258 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 847 to 848
2021-02-15 21:44:52.265 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 848 to 849
2021-02-15 21:44:57.271 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 849 to 850
2021-02-15 21:44:58.270 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 850 to 849
2021-02-15 21:44:59.271 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 849 to 850
2021-02-15 21:45:05.284 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 850 to 859
2021-02-15 21:45:12.291 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 859 to 850
2021-02-15 21:45:16.296 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 850 to 864
2021-02-15 21:45:17.297 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 864 to 860
2021-02-15 21:45:29.307 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 860 to 859
2021-02-15 21:45:31.309 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 859 to 860
2021-02-15 21:46:06.343 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 860 to 861
2021-02-15 21:46:07.347 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 861 to 860
2021-02-15 21:46:12.352 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 860 to 861
2021-02-15 21:46:15.354 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 861 to 860
2021-02-15 21:46:17.359 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 860 to 861
2021-02-15 21:46:26.383 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 861 to 862
2021-02-15 21:46:30.388 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 862 to 863
2021-02-15 21:47:00.414 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 863 to 864
2021-02-15 21:47:07.420 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 864 to 865
2021-02-15 21:47:09.422 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 865 to 864
2021-02-15 21:47:10.424 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 864 to 865
2021-02-15 21:47:38.451 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 865 to 866
2021-02-15 21:48:20.486 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 866 to 867
2021-02-15 21:48:39.506 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 867 to 868
2021-02-15 21:48:57.520 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 869
2021-02-15 21:49:08.526 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 868
2021-02-15 21:49:10.529 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 869
2021-02-15 21:49:29.541 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 870
2021-02-15 21:49:34.543 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 870 to 869
2021-02-15 21:49:39.546 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 870
2021-02-15 21:49:45.550 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 870 to 869
2021-02-15 21:49:47.553 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 870
2021-02-15 21:49:48.553 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 870 to 869
2021-02-15 21:49:49.555 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 870
2021-02-15 21:49:59.561 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 870 to 869
2021-02-15 21:50:00.561 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 870
2021-02-15 21:50:40.588 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 870 to 871
2021-02-15 21:50:42.590 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 871 to 867
2021-02-15 21:50:43.590 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 867 to 868
2021-02-15 21:50:46.593 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 867
2021-02-15 21:50:47.593 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 867 to 868
2021-02-15 21:51:02.603 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 867
2021-02-15 21:51:11.618 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 867 to 868
2021-02-15 21:51:12.619 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 867
2021-02-15 21:51:23.627 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 867 to 868
2021-02-15 21:51:24.627 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 867
2021-02-15 21:51:27.628 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 867 to 868
2021-02-15 21:51:43.639 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 867
2021-02-15 21:51:49.643 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 867 to 868
2021-02-15 21:52:02.659 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 867
2021-02-15 21:52:07.661 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 867 to 868
2021-02-15 21:53:19.709 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 869
2021-02-15 21:53:23.713 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 868
2021-02-15 21:53:26.716 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 869
2021-02-15 21:53:33.719 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 868
2021-02-15 21:53:37.723 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 869
2021-02-15 21:53:44.727 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 868
2021-02-15 21:53:52.741 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 869
2021-02-15 21:53:54.734 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 868
2021-02-15 21:53:55.735 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 869
2021-02-15 21:53:57.736 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 868
2021-02-15 21:54:02.741 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 869
2021-02-15 21:54:16.749 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 868
2021-02-15 21:54:18.753 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 869
2021-02-15 21:54:26.757 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 868
2021-02-15 21:54:28.759 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 869
2021-02-15 21:54:34.761 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 868
2021-02-15 21:54:36.763 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 868 to 869
2021-02-15 21:55:17.800 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 870
2021-02-15 21:55:21.803 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 870 to 869
2021-02-15 21:55:23.806 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 870
2021-02-15 21:55:24.804 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 870 to 869
2021-02-15 21:55:35.812 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 870
2021-02-15 21:55:37.814 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 870 to 869
2021-02-15 21:55:44.820 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 870
2021-02-15 21:55:46.822 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 870 to 869
2021-02-15 21:55:49.824 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 870
2021-02-15 21:56:02.835 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 870 to 869
2021-02-15 21:56:03.836 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 870
2021-02-15 21:56:19.846 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 870 to 869
2021-02-15 21:56:21.848 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 869 to 870
2021-02-15 21:58:14.944 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 870 to 871
2021-02-15 21:58:15.944 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 871 to 870
2021-02-15 21:58:21.948 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 870 to 871
2021-02-15 21:58:24.951 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 871 to 873
2021-02-15 21:58:28.953 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 873 to 874
2021-02-15 21:59:10.987 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 874 to 875
2021-02-15 21:59:11.987 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 875 to 874
2021-02-15 21:59:37.004 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 874 to 875
2021-02-15 21:59:45.010 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 875 to 874
2021-02-15 21:59:52.021 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 874 to 875
2021-02-15 22:00:07.024 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 875 to 874
2021-02-15 22:00:08.027 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 874 to 875
2021-02-15 22:00:13.028 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 875 to 876
2021-02-15 22:00:26.040 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 876 to 877
2021-02-15 22:00:27.040 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 877 to 878
2021-02-15 22:00:37.065 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 878 to 879
2021-02-15 22:00:40.052 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 879 to 880
2021-02-15 22:00:48.058 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 880 to 881
2021-02-15 22:00:57.070 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 881 to 882
2021-02-15 22:01:03.077 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 882 to 883
2021-02-15 22:01:12.082 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 883 to 884
2021-02-15 22:01:17.086 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 884 to 883
2021-02-15 22:01:20.088 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 883 to 884
2021-02-15 22:01:23.090 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 884 to 885
2021-02-15 22:01:28.094 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 885 to 886
2021-02-15 22:01:37.105 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 886 to 887
2021-02-15 22:01:45.110 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 887 to 888
2021-02-15 22:02:00.123 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 888 to 889
2021-02-15 22:02:03.122 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 889 to 890
2021-02-15 22:02:16.135 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 890 to 891
2021-02-15 22:02:25.142 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 891 to 892
2021-02-15 22:02:33.149 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 892 to 893
2021-02-15 22:02:35.150 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 893 to 895
2021-02-15 22:02:38.153 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 895 to 896
2021-02-15 22:02:40.154 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 896 to 897
2021-02-15 22:02:42.156 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 897 to 898
2021-02-15 22:02:54.166 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 898 to 899
2021-02-15 22:02:57.168 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 899 to 900
2021-02-15 22:03:05.176 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 900 to 901
2021-02-15 22:03:10.180 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 901 to 902
2021-02-15 22:03:16.183 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 902 to 903
2021-02-15 22:03:18.186 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 903 to 904
2021-02-15 22:03:27.193 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 904 to 905
2021-02-15 22:03:37.200 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 905 to 906
2021-02-15 22:03:46.209 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 906 to 907
2021-02-15 22:03:50.214 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 907 to 908
2021-02-15 22:03:55.217 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 908 to 907
2021-02-15 22:03:56.218 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 907 to 908
2021-02-15 22:03:58.219 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 908 to 909
2021-02-15 22:04:03.236 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 909 to 908
2021-02-15 22:04:04.237 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 908 to 909
2021-02-15 22:04:07.239 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 909 to 910
2021-02-15 22:04:10.246 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 910 to 909
2021-02-15 22:04:16.250 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 909 to 910
2021-02-15 22:04:20.256 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 910 to 911
2021-02-15 22:04:27.265 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 911 to 912
2021-02-15 22:04:36.267 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 912 to 913
2021-02-15 22:04:46.299 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 913 to 914
2021-02-15 22:04:57.310 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 914 to 915
2021-02-15 22:05:06.319 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 915 to 916
2021-02-15 22:05:16.324 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 916 to 917
2021-02-15 22:05:27.332 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 917 to 918
2021-02-15 22:05:36.344 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 918 to 919
2021-02-15 22:05:40.341 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 919 to 920
2021-02-15 22:05:42.342 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 920 to 922
2021-02-15 22:05:47.345 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 922 to 923
2021-02-15 22:05:57.353 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 923 to 924
2021-02-15 22:06:07.358 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 924 to 925
2021-02-15 22:06:16.365 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 925 to 926
2021-02-15 22:06:17.366 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 926 to 927
2021-02-15 22:06:20.367 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 927 to 928
2021-02-15 22:06:25.371 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 928 to 927
2021-02-15 22:06:26.371 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 927 to 928
2021-02-15 22:06:33.376 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 928 to 929
2021-02-15 22:06:34.376 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 929 to 928
2021-02-15 22:06:35.381 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 928 to 929
2021-02-15 22:06:36.378 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 929 to 930
2021-02-15 22:06:37.382 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 930 to 929


@morph166955
Copy link
Author

morph166955 commented Feb 15, 2021

@mhilbush Please test the jar at:

https://github.com/morph166955/openhab-core/releases/tag/memoryleak2200-1

I've not pushed the code up to my branch yet. Looking to see if this helps the situation.

To note, the second memory leak seems to kick a few seconds after the following are triggered:

2021-02-15 23:35:42.012 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job
2021-02-15 23:36:42.013 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job
2021-02-15 23:36:44.183 [DEBUG] [est.sitemap.internal.SitemapResource] - Run clean SSE subscriptions job
2021-02-15 23:37:42.013 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job
2021-02-15 23:38:42.013 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job
2021-02-15 23:38:44.183 [DEBUG] [est.sitemap.internal.SitemapResource] - Run clean SSE subscriptions job
2021-02-15 23:39:42.014 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job
2021-02-15 23:40:42.014 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job
2021-02-15 23:40:44.183 [DEBUG] [est.sitemap.internal.SitemapResource] - Run clean SSE subscriptions job
2021-02-15 23:41:42.014 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job
2021-02-15 23:42:42.015 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job
2021-02-15 23:42:44.183 [DEBUG] [est.sitemap.internal.SitemapResource] - Run clean SSE subscriptions job
2021-02-15 23:43:42.015 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job
2021-02-15 23:44:42.015 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job
2021-02-15 23:44:44.183 [DEBUG] [est.sitemap.internal.SitemapResource] - Run clean SSE subscriptions job
2021-02-15 23:45:42.016 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job
2021-02-15 23:46:42.016 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job
2021-02-15 23:46:44.183 [DEBUG] [est.sitemap.internal.SitemapResource] - Run clean SSE subscriptions job
2021-02-15 23:47:42.017 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job
2021-02-15 23:48:42.017 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job
2021-02-15 23:48:44.183 [DEBUG] [est.sitemap.internal.SitemapResource] - Run clean SSE subscriptions job
2021-02-15 23:49:42.017 [DEBUG] [core.karaf.internal.FeatureInstaller] - Running scheduled sync job

Then for another reason I haven't figured out yet, about 15 minutes in, something else kicks and memory starts to build again.

@mhilbush
Copy link
Contributor

@morph166955 I haven't had a chance to try your jar yet.

However, I believe the problem is in the call to parseScriptIntoXTextEObject, specifically the reuse of the resource set using the getResourceSet method here.

https://github.com/openhab/openhab-core/blob/main/bundles/org.openhab.core.model.script.runtime/src/org/openhab/core/model/script/runtime/internal/engine/ScriptEngineImpl.java#L126

It appears to me that the resource set will grow on every call to parseScriptIntoXTextEObject. The problem was masked when the parsed rule was cached. However, the cached rule only applies to valid rules. An invalid rule will result in a call to parseScriptIntoXTextEObject on every invocation of the rule, which is why we see the memory grow only for invalid rules. I believe that if you back out that caching commit, memory will grow continuously, even with a valid rule.

@kaikreuzer WDYT?

morph166955 added a commit to morph166955/openhab-core that referenced this issue Feb 16, 2021
@morph166955
Copy link
Author

Interesting. I hadn't seen that and I would agree that could be causing a leak as well. There definitely is a second leak that I see kick on after the 15 minute mark. I would still appreciate if you can test the jar to see if it does anything on your system. My system stays very stable until that 15 minute point, and then it starts to spin off to oblivion. I suppose we could fix the leak you have identified simply by setting this.resourceSet = null; or just running deactivate(); before the exception is thrown? If you agree that this would resolve that issue I can add that to my branch and generate up a new test jar.

I pushed my changes to https://github.com/morph166955/openhab-core/tree/memoryleak2200 for review as well.

@mhilbush
Copy link
Contributor

mhilbush commented Feb 16, 2021

I made a simple change to prevent the resource set from being reused (see link to code comparison below). This probably isn't a proper fix, as it has pretty significant CPU impact due to the resource set being recreated on each invocation of an invalid rule. However, I wanted to see if it fixed the memory leak, which it did. OTOH, my change has a severe effect on CPU only when you have an invalid UI DSL rule that executes every second. 😉

I'll defer to @kaikreuzer on what to do as the proper long term fix.

main...mhilbush:ui-dsl-fix-mem-leak

invalid-rule-test

Edit: Been running stable for over 12 hours.

@morph166955
Copy link
Author

Interesting. Then why did me modifying the context code also have an impact on the memory usage? Are they somehow tied together?

@mhilbush
Copy link
Contributor

Not sure. I'm guessing that it might've reduced the amount of memory added to the resource set on each invalid rule execution. So, maybe it just took longer for the problem to show up. But that's just a guess.

@openhab-bot
Copy link
Collaborator

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/openhab-3-runs-out-of-memory-java-heap-space-errors-cpu-100-after-a-few-hours/110924/132

@mhilbush
Copy link
Contributor

mhilbush commented Feb 16, 2021

Perhaps an option that would have less performance impact would be to call resource.unload() or resource.delete(Collections.emptyMap()) in parseScriptIntoXTextEObject before returning and throwing exceptions.

I'm running a version now that reuses the resource set and calls resource.delete. It has minimal performance impact, and memory utilization appears to be stable so far.

@morph166955
Copy link
Author

I have no issues with that. Perhaps we ultimately do both?

@cpmeister
Copy link
Contributor

After looking at the code an obvious problem I see is that the ScriptEngineImpl will create a new resource for every script the comes in, even if such a script is identical to something that has already been loaded.

My suggestion to solve that memory leak issue is to only create a new resource if it is in fact a new script.
Using the hashcode of the script string should be sufficient to almost guarantee uniqueness.

main...cpmeister:2200

@cpmeister
Copy link
Contributor

Perhaps an option that would have less performance impact would be to call resource.unload() or resource.delete(Collections.emptyMap()) in parseScriptIntoXTextEObject before returning and throwing exceptions.

According to the documentation resource.unload() doesn't actually remove the uri from the resource set.

@cpmeister
Copy link
Contributor

cpmeister commented Feb 16, 2021

I'm running a version now that reuses the resource set and calls resource.delete. It has minimal performance impact, and memory utilization appears to be stable so far.

I guess resource.delete is also an option. In fact it is probably the safest option as well. The only downside is the repeated overhead of having to load the resource each time a script is run.

@mhilbush
Copy link
Contributor

According to the documentation resource.unload() doesn't actually remove the uri from the resource set.

Yeah, I saw that too.

I like what you did in your approach. Not knowing too much about how this stuff works, I was steering clear of makeing too many changes...

I guess resource.delete is also an option. In fact it is probably the safest option as well. The only downside is the repeated overhead of having to load the resource each time a script is run.

Yes, it seems to work, and the overhead doesn't look too bad. Remember, the overhead is only incurred when parseScriptIntoXTextEObject is called, which is only once if the rule is valid.

@mhilbush
Copy link
Contributor

Thanks for your feedback, BTW!

@mhilbush
Copy link
Contributor

mhilbush commented Feb 16, 2021

So in several places I added a call to a deleteResource method that simply calls resource.delete(Collections.emptyMap()).

main...mhilbush:ui-dsl-mem-fix2

Which resulted in the following CPU and memory profile.

ui-rule

My test case of an invalid rule executing once per second is pretty severe. If you have that situation going on in your system, you probably have bigger problems. 😉 That's why I modeled once per 15 seconds, which also seems pretty severe, but the results are much more reasonable.

@morph166955
Copy link
Author

I'll defer to y'all here on the cause of the issue. I'll still submit a PR for the context memort issue "because it was found and there's no reason not to fix it".

@mhilbush
Copy link
Contributor

mhilbush commented Feb 17, 2021

I actually think the context issue is just a side-effect of the core issue. Probably because parts of the context are stored/referenced as part of the resource in the resource set.

@mhilbush
Copy link
Contributor

I think there's still a memory leak even for valid rules. Every time a rule is saved or disabled/enabled, I believe memory will be lost.

@kaikreuzer
Copy link
Member

Hey guys, sorry for not chiming in any earlier, but I have been sick this week and am only slowly recovering.

For the "invalid rule from UI" as per description of this issue, the fix main...mhilbush:ui-dsl-mem-fix2 seems to be just the right thing to do. Clearly the created resources leak and must be deleted. I have tested it locally and it definitely looks fixed by that change.

So @mhilbush, not to steal you your credit for the fix, may I ask you to create a PR with that?

If you both feel that there are other leaks for valid rules, I'd suggest to open a new issue with a new description for that.

@kaikreuzer
Copy link
Member

kaikreuzer commented Feb 19, 2021

Just ftr, here's my heap over the last hour with a rule execution (invalid one) every single second:

Screenshot 2021-02-19 at 23 03 39

@morph166955
Copy link
Author

Perhaps it's the novice understanding of how things are done in the memory here, but why are the spikes and dips so drastically apart for one rule? It looks like you go from ~45MB to ~125MB up and down. That just seems like a huge sawtooth for one rule no? No disagreement that this seems to fix the issue, and it looks like the context issue is definitely solved with this, I'm just looking at that diagram and then the one a few posts up by @mhilbush going from ~175MB to ~425MB and saying "wow, that seems like a huge amount of memory carved off just for a rule."

@mhilbush
Copy link
Contributor

I'll submit the PR later today.

And I'll open a separate issue for the leak from a valid rule.

@kaikreuzer
Copy link
Member

why are the spikes and dips so drastically apart for one rule

The number of rules doesn't matter here - the whole Xtext parsing is started every second and that simply involves many objects being instantiated, which can then later be garbage collected. I'd consider this as a quite normal behavior, since a lot of processing is indeed happening. For rules that are valid, we cache the parsed result, so in this case the parsing does not happen anymore - and that should be the normal situation.

@seaside1
Copy link

seaside1 commented Feb 21, 2021

@mhilbush any link for the issue with memory leak for the valid dsl rule?

@mhilbush
Copy link
Contributor

Issue for UI DSL valid rule memory leak is here.

#2213

@openhab-bot
Copy link
Collaborator

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/memory-leak-using-jdbc-mysql-in-rules/119370/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants