Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[influxdb] Unhandled InfluxDBException (timeout) #15443

Closed
Marco-Kawon opened this issue Aug 17, 2023 · 9 comments · Fixed by #15452
Closed

[influxdb] Unhandled InfluxDBException (timeout) #15443

Marco-Kawon opened this issue Aug 17, 2023 · 9 comments · Fixed by #15452
Labels
bug An unexpected problem or unintended behavior of an add-on

Comments

@Marco-Kawon
Copy link

Marco-Kawon commented Aug 17, 2023

Expected Behavior

InfluxDB Persistance Service should not crash and recover from a InfluxDBException (timeout) exception.

Current Behavior

Hey there,
for whatever reason I am getting a timeout every few days from my influxdb when openhab wants to write data to it.

The issue is that once this error occurs, openHab stops writing to influxdb forever until I restart openhab, even if InfluxDB is working again. It does not recover from that failed state until openhab gets restarted.

A very similar issue was already discussed and handled here in this commit:
d2e10ab

My Problem is that only "InfluxException" and "InfluxDBIOException" are handled there.
But I am getting a "InfluxDBException", which is not catched here and then it crashes the persistance service.

2023-08-11 21:04:45.131 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception: 
org.influxdb.InfluxDBException: timeout
	at org.influxdb.InfluxDBException.buildExceptionFromErrorMessage(InfluxDBException.java:161) ~[?:?]
	at org.influxdb.InfluxDBException.buildExceptionForErrorState(InfluxDBException.java:173) ~[?:?]
	at org.influxdb.impl.InfluxDBImpl.execute(InfluxDBImpl.java:837) ~[?:?]
	at org.influxdb.impl.InfluxDBImpl.write(InfluxDBImpl.java:470) ~[?:?]
	at org.openhab.persistence.influxdb.internal.influx1.InfluxDB1RepositoryImpl.write(InfluxDB1RepositoryImpl.java:133) ~[?:?]
	at org.openhab.persistence.influxdb.InfluxDBPersistenceService.commit(InfluxDBPersistenceService.java:285) ~[?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]

Possible Solution

The "InfluxDBIOException" extends "InfluxDBException" as you can see here so I think it should be safe to replace the recently added catch of "InfluxDBIOException" by "InfluxDBException" to catch both of them. InfluxDB's write() method always throws InfluxDBExceptions.

Then I would just get a error thrown on console (DEBUG Level) and it would try again later until InfluxDB responds again ...

2023-08-18 15:46:08.829 [DEBUG] [rnal.influx1.InfluxDB1RepositoryImpl] - Writing to database failed
org.influxdb.InfluxDBIOException: java.net.SocketTimeoutException: timeout
	at org.influxdb.impl.InfluxDBImpl.execute(InfluxDBImpl.java:841) ~[?:?]

Context

Has been discussed here

Your Environment

  • openHAB 4.0.2
  • Openhabian on rpi4.
  • InfluxDB 1.8.x
@Marco-Kawon Marco-Kawon added the bug An unexpected problem or unintended behavior of an add-on label Aug 17, 2023
@jlaur jlaur changed the title [influxdb] Unhandled InfluxDBException (timeout) [influxdb] Unhandled InfluxDBException (timeout) Aug 17, 2023
@lnagel
Copy link
Contributor

lnagel commented Sep 3, 2023

I have been experiencing this too now a few times, resulting in the failure of the persistance service and resultingly the failure of my house heating. This issue seems to happen reliably at 06:30 in the morning, which happens to be the exact moment of when my influxdb backup is being executed. Timeline seems to be like this:

2023-09-02 06:25:02 influxdb backup started using the following command

influxd backup -portable /var/lib/openhab/backups/influxdb/influxdb-backup-20230902_062502

2023-09-02 06:30:44 stack trace in openhab.log

2023-09-02 06:30:44.580 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception: 
org.influxdb.InfluxDBException: timeout
        at org.influxdb.InfluxDBException.buildExceptionFromErrorMessage(InfluxDBException.java:161) ~[?:?]
        at org.influxdb.InfluxDBException.buildExceptionForErrorState(InfluxDBException.java:173) ~[?:?]
        at org.influxdb.impl.InfluxDBImpl.execute(InfluxDBImpl.java:837) ~[?:?]
        at org.influxdb.impl.InfluxDBImpl.write(InfluxDBImpl.java:470) ~[?:?]
        at org.openhab.persistence.influxdb.internal.influx1.InfluxDB1RepositoryImpl.write(InfluxDB1RepositoryImpl.java:133) ~[?:?]
        at org.openhab.persistence.influxdb.InfluxDBPersistenceService.commit(InfluxDBPersistenceService.java:285) ~[?:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at java.lang.Thread.run(Thread.java:833) ~[?:?]
2023-09-02 06:31:00.505 [INFO ] [utomation.openhab-js.hea

2023-09-02 06:30:59 influxdb backup finished, confirmed using the stat command

root@openhab:~# stat /var/lib/openhab/backups/influxdb/influxdb-backup-20230902_062502/20230902T032502Z.manifest
  File: /var/lib/openhab/backups/influxdb/influxdb-backup-20230902_062502/20230902T032502Z.manifest
  Size: 45725           Blocks: 9          IO Block: 46080  regular file
Device: 30h/48d Inode: 167709      Links: 1
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-09-02 06:30:59.355615189 +0300
Modify: 2023-09-02 06:30:59.355615189 +0300
Change: 2023-09-02 06:30:59.355615189 +0300
 Birth: -

@J-N-K
Copy link
Member

J-N-K commented Sep 3, 2023

Fix is already proposed, but needs review and merge.

@openhab-bot
Copy link
Collaborator

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/influxdb-persistence-stops-every-1-4-days/149446/3

@TheNetStriker
Copy link
Contributor

I'm currently also having problems with influxdb in OpenHab not reconnecting after a server reboot. I did just take a look at the changes in #15452 and it seems that the fix was only applied to the influxdb v1 implementation.

In #15062 the InfluxDBIOException was added to both files (influx1/InfluxDB1RepositoryImpl.java and influx2/InfluxDB2RepositoryImpl.java) but in #15452 the change from InfluxDBIOException to InfluxDBException was only made on the influx1/InfluxDB1RepositoryImpl.java.

I don't know if this is correct or if this should also be changed in the v2 implementation.

@r33a
Copy link

r33a commented Nov 10, 2023

Same here.
OH 4.0.4 and Influxdb 2.
Timeouts every few hours and no reconnect.
Save persistence settings for influxdb and it's working for the next couple of hours.

@J-N-K
Copy link
Member

J-N-K commented Nov 10, 2023

Are there any exceptions in the log?

@r33a
Copy link

r33a commented Nov 11, 2023

This is in the log:
2023-11-11 07:53:29.473 [ERROR] [.client.write.events.WriteErrorEvent] - The error occurred during writing of data com.influxdb.exceptions.InfluxException: interrupted at com.influxdb.client.internal.AbstractWriteClient.toInfluxException(AbstractWriteClient.java:546) ~[?:?] at com.influxdb.client.internal.AbstractWriteClient.lambda$new$12(AbstractWriteClient.java:178) ~[?:?] at io.reactivex.internal.subscribers.LambdaSubscriber.onNext(LambdaSubscriber.java:65) ~[?:?] at io.reactivex.internal.operators.flowable.FlowableDoFinally$DoFinallySubscriber.onNext(FlowableDoFinally.java:84) ~[?:?] at io.reactivex.internal.operators.mixed.FlowableConcatMapMaybe$ConcatMapMaybeSubscriber.drain(FlowableConcatMapMaybe.java:284) ~[?:?] at io.reactivex.internal.operators.mixed.FlowableConcatMapMaybe$ConcatMapMaybeSubscriber.innerSuccess(FlowableConcatMapMaybe.java:179) ~[?:?] at io.reactivex.internal.operators.mixed.FlowableConcatMapMaybe$ConcatMapMaybeSubscriber$ConcatMapMaybeObserver.onSuccess(FlowableConcatMapMaybe.java:322) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeOnErrorNext$OnErrorNextMaybeObserver.onSuccess(MaybeOnErrorNext.java:88) ~[?:?] at io.reactivex.internal.operators.maybe.MaybePeek$MaybePeekObserver.onSuccess(MaybePeek.java:122) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeMap$MapMaybeObserver.onSuccess(MaybeMap.java:89) ~[?:?] at io.reactivex.internal.operators.flowable.FlowableSingleMaybe$SingleElementSubscriber.onComplete(FlowableSingleMaybe.java:104) ~[?:?] at io.reactivex.subscribers.SerializedSubscriber.onComplete(SerializedSubscriber.java:168) ~[?:?] at io.reactivex.internal.operators.flowable.FlowableRetryWhen$RetryWhenSubscriber.onComplete(FlowableRetryWhen.java:82) ~[?:?] at io.reactivex.internal.subscriptions.DeferredScalarSubscription.complete(DeferredScalarSubscription.java:134) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeToFlowable$MaybeToFlowableSubscriber.onSuccess(MaybeToFlowable.java:70) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeMap$MapMaybeObserver.onSuccess(MaybeMap.java:89) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeMap$MapMaybeObserver.onSuccess(MaybeMap.java:89) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeFromCallable.subscribeActual(MaybeFromCallable.java:61) ~[?:?] at io.reactivex.Maybe.subscribe(Maybe.java:4290) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeMap.subscribeActual(MaybeMap.java:40) ~[?:?] at io.reactivex.Maybe.subscribe(Maybe.java:4290) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeMap.subscribeActual(MaybeMap.java:40) ~[?:?] at io.reactivex.Maybe.subscribe(Maybe.java:4290) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeToFlowable.subscribeActual(MaybeToFlowable.java:45) ~[?:?] at io.reactivex.Flowable.subscribe(Flowable.java:14935) ~[?:?] at io.reactivex.Flowable.subscribe(Flowable.java:14882) ~[?:?] at io.reactivex.internal.operators.flowable.FlowableRepeatWhen$WhenReceiver.onNext(FlowableRepeatWhen.java:100) ~[?:?] at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.tryEmit(FlowableFlatMap.java:282) ~[?:?] at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onNext(FlowableFlatMap.java:668) ~[?:?] at io.reactivex.subscribers.SerializedSubscriber.onNext(SerializedSubscriber.java:100) ~[?:?] at io.reactivex.internal.operators.flowable.FlowableDelay$DelaySubscriber$OnNext.run(FlowableDelay.java:114) ~[?:?] at io.reactivex.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:66) ~[?:?] at io.reactivex.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:57) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.lang.Thread.run(Thread.java:840) ~[?:?] Caused by: java.io.InterruptedIOException: interrupted at okio.Timeout.throwIfReached(Timeout.java:146) ~[?:?] at okio.Okio$1.write(Okio.java:76) ~[?:?] at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) ~[?:?] at okio.RealBufferedSink.emitCompleteSegments(RealBufferedSink.java:179) ~[?:?] at okio.RealBufferedSink.write(RealBufferedSink.java:42) ~[?:?] at okhttp3.internal.http1.Http1ExchangeCodec$KnownLengthSink.write(Http1ExchangeCodec.java:324) ~[?:?] at okio.ForwardingSink.write(ForwardingSink.java:35) ~[?:?] at okhttp3.internal.connection.Exchange$RequestBodySink.write(Exchange.java:231) ~[?:?] at okio.RealBufferedSink.emitCompleteSegments(RealBufferedSink.java:179) ~[?:?] at okio.RealBufferedSink.write(RealBufferedSink.java:92) ~[?:?] at okhttp3.RequestBody$2.writeTo(RequestBody.java:151) ~[?:?] at retrofit2.RequestBuilder$ContentTypeOverridingRequestBody.writeTo(RequestBuilder.java:283) ~[?:?] at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:69) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[?:?] at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:43) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[?:?] at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[?:?] at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[?:?] at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[?:?] at com.influxdb.client.internal.GzipInterceptor.intercept(GzipInterceptor.java:91) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[?:?] at com.influxdb.client.internal.AuthenticateInterceptor.intercept(AuthenticateInterceptor.java:97) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[?:?] at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.java:152) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[?:?] at com.influxdb.internal.UserAgentInterceptor.intercept(UserAgentInterceptor.java:60) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[?:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[?:?] at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229) ~[?:?] at okhttp3.RealCall.execute(RealCall.java:81) ~[?:?] at retrofit2.OkHttpCall.execute(OkHttpCall.java:188) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeMap$MapMaybeObserver.onSuccess(MaybeMap.java:82) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeFromCallable.subscribeActual(MaybeFromCallable.java:61) ~[?:?] at io.reactivex.Maybe.subscribe(Maybe.java:4290) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeMap.subscribeActual(MaybeMap.java:40) ~[?:?] at io.reactivex.Maybe.subscribe(Maybe.java:4290) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeMap.subscribeActual(MaybeMap.java:40) ~[?:?] at io.reactivex.Maybe.subscribe(Maybe.java:4290) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeToFlowable.subscribeActual(MaybeToFlowable.java:45) ~[?:?] at io.reactivex.Flowable.subscribe(Flowable.java:14935) ~[?:?] at io.reactivex.Flowable.subscribe(Flowable.java:14882) ~[?:?] at io.reactivex.internal.operators.flowable.FlowableRepeatWhen$WhenReceiver.onNext(FlowableRepeatWhen.java:100) ~[?:?] at io.reactivex.internal.operators.flowable.FlowableRetryWhen.subscribeActual(FlowableRetryWhen.java:62) ~[?:?] at io.reactivex.Flowable.subscribe(Flowable.java:14935) ~[?:?] at io.reactivex.internal.operators.flowable.FlowableSingleMaybe.subscribeActual(FlowableSingleMaybe.java:34) ~[?:?] at io.reactivex.Maybe.subscribe(Maybe.java:4290) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeMap.subscribeActual(MaybeMap.java:40) ~[?:?] at io.reactivex.Maybe.subscribe(Maybe.java:4290) ~[?:?] at io.reactivex.internal.operators.maybe.MaybePeek.subscribeActual(MaybePeek.java:56) ~[?:?] at io.reactivex.Maybe.subscribe(Maybe.java:4290) ~[?:?] at io.reactivex.internal.operators.maybe.MaybeOnErrorNext.subscribeActual(MaybeOnErrorNext.java:46) ~[?:?] at io.reactivex.Maybe.subscribe(Maybe.java:4290) ~[?:?] at io.reactivex.internal.operators.mixed.FlowableConcatMapMaybe$ConcatMapMaybeSubscriber.drain(FlowableConcatMapMaybe.java:276) ~[?:?] ... 33 more

@J-N-K
Copy link
Member

J-N-K commented Nov 11, 2023

Nothing else? What exactly is in the log before?

@J-N-K
Copy link
Member

J-N-K commented Nov 11, 2023

See #15879

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants