-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source Kafka: polling records cause OutOfMemory #14382
Comments
@tunguyen-12 agree with your suggestion. Looks the connector is using other logic to stop the sync which can cause the problem. I added this issue to connector team roadmap for future implementation Lines 95 to 101 in 59e20f2
Can you give more information about the server instance you're running Airbyte and the collection size in Kafka, so the team can reproduce easily? |
Hi @marcosmarxm, Here is the information about my deployment
|
@marcosmarxm Could you give me more details about the problem you said: Looks the connector is using other logic to stop the sync which can cause the problem
|
From the code pasted here looks the connector tries to read all records in the collection, stop after and then flush to destination. Which is not going to work for your use case with a fast message generation. Probably this is the reason of OOM |
My idea is reading by microbach, implement LazyIterator with abstract List load() function, when calling hasNext() if there available message in global queue return true if not call load() function My concerns here will this cause OOM in worker ?
|
I think it's safer compared to logic implemented today. |
Yeah, I have read the debezium flow and I applied the logic for Kafka flow and it works like a charm, I'll consider to submit a PR in near future. |
IMHO, record count limit has few short comings which prevent utilizing an expectedly long running task. Feels like this could be managed with tracking total fetch size (not the number of records) if the problem is OOM ? Still, cutting the job short is an issue. |
Is there any workaround for the issue?
Moreover, It commits offsets but does not write data into the destination. |
I thought that limiting max records per run can help but looks like if we use JSON format this parameter does not work properly: |
Environment
Current Behavior
Kafka Source keep polling reocords cause Out Of Memory
See logic here:
airbyte/airbyte-integrations/connectors/source-kafka/src/main/java/io/airbyte/integrations/source/kafka/KafkaSource.java
Line 94 in 59e20f2
Expected Behavior
Should limit by number of records or bytes
Logs
Steps to Reproduce
Are you willing to submit a PR?
Not Now
The text was updated successfully, but these errors were encountered: