-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDBC Sinks Stop Working as Intended after a Database Error #9464
Comments
Updated the title, since I tested this using a PostgreSQL sink connector and get the same result, and I now believe this is a general issue with JDBC sinks. |
Logs from the PostgreSQL sink connector. Configured with all the same settings as were used with the ClickHouse sink connector, and using the same Kubernetes runtime.
|
Now believe that this issue is related to using the |
Thanks @Alxander64 , Sorry for the late response, have you tried the new Pulsar version 2.7.0? or 2.6.3? If the problem still there, we need to fix it ASAP |
I have recently updated to 2.6.3, but since then I've only been running sinks on a more stable database. I first noticed this issue when sinking to ClickHouse, which I didn't have a great production setup for. |
For a simple test, I had my Pulsar cluster in k8s and brought up a singe Postgres replica with a Helm chart. I had a sink running configured like how I described above, and then I just deleted the pod running Postgres and waited for it to respawn. If new rows don't eventually populate in the table being sinked to, then the problem persists. |
…the fatal exception (#21143) PIP: #21079 ### Motivation Currently, the connector and function cannot terminate the function instance if there are fatal exceptions thrown outside the function instance thread. The current implementation of the connector and Pulsar Function exception handler cannot handle the fatal exceptions that are thrown outside the function instance thread. For example, suppose we have a sink connector that uses its own threads to batch-sink the data to an external system. If any fatal exceptions occur in those threads, the function instance thread will not be aware of them and will not be able to terminate the connector. This will cause the connector to hang indefinitely. There is a related issue here: #9464 The same problem exists for the source connector. The source connector may also use a separate thread to fetch data from an external system. If any fatal exceptions happen in that thread, the connector will also hang forever. This issue has been observed for the Kafka source connector: #9464. We have fixed it by adding the notifyError method to the `PushSource` class in PIP-281: #20807. However, this does not solve the same problem that all source connectors face because not all connectors are implemented based on the `PushSource` class. The problem is same for the Pulsar Function. Currently, the function can't throw fatal exceptions to the function framework. We need to provide a way for the function developer to implement it. We need a way for the connector and function developers to throw fatal exceptions outside the function instance thread. The function framework should catch these exceptions and terminate the function accordingly. ### Modifications Introduce a new method `fatal` to the context. All the connector implementation code and the function code can use this context and call the `fatal` method to terminate the instance while raising a fatal exception. After the connector or function raises the fatal exception, the function instance thread will be interrupted. The function framework then could catch the exception, log it, and then terminate the function instance.
Describe the bug
When running a ClickHouse JDBC Sink, and encountering some error from the database (e.g. timeout), the sinks seems to continue consuming, but not actually insert or ack any further messages.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The sink should recover, and be able to continue inserting and acking messages.
Logs
In the logs you can see that the sink logs the regular update, shows the error from having the connection refused by ClickHouse (for now this happens when we have a restart), and the regular updates are being logged again, similar to how they were before.
Screenshots
In this screenshot you can see how there was a point where the backlog was accumulating. This was one instance of this error affecting the sink. Then the backlog comes back down after I manually restarted the sink from the CLI, which had the sink running properly again. And then later, another instance of this error occurred, and the backlog begins to accumulate again.
Additional context
Mentioned in the steps to reproduce:
Ideas
My working theory is that there's either something wrong logically with the JDBC sinks, where they somehow don't work properly after encountering some error from the database.
Or, there is something wrong more specifically with the ClickHouse JDBC driver being used, and it doesn't handle errors correctly.
I have not tested this with any other databases, but I imagine a quick test with either PostgreSQL or MySQL may reveal if this is a general issue with the JDBC sinks or not.
The text was updated successfully, but these errors were encountered: