Source Kafka: polling records cause OutOfMemory #14382

tunguyen-12 · 2022-07-04T08:05:43Z

Environment

Airbyte version: example is 0.39.29-alpha
OS Version / Instance: example macOS, Windows 7/10, Ubuntu 18.04, GCP n2. , AWS EC2
Deployment: example are Docker or Kubernetes deploy env
Source Connector and version: (if applicable example Salesforce 0.2.3)
Destination Connector and version: (if applicable example Postgres 0.3.3)
Step where error happened: Deploy / Sync job / Setup new connection / Update connector / Upgrade Airbyte

Current Behavior

Kafka Source keep polling reocords cause Out Of Memory
See logic here:

airbyte/airbyte-integrations/connectors/source-kafka/src/main/java/io/airbyte/integrations/source/kafka/KafkaSource.java

Line 94 in 59e20f2

while (true) {

Expected Behavior

Should limit by number of records or bytes

Logs

Steps to Reproduce

batching records by number or records or bytes

Are you willing to submit a PR?

Not Now

marcosmarxm · 2022-07-04T13:08:31Z

@tunguyen-12 agree with your suggestion. Looks the connector is using other logic to stop the sync which can cause the problem. I added this issue to connector team roadmap for future implementation

airbyte/airbyte-integrations/connectors/source-kafka/src/main/java/io/airbyte/integrations/source/kafka/KafkaSource.java

Lines 95 to 101 in 59e20f2

    
           final ConsumerRecords<String, JsonNode> consumerRecords = consumer.poll(Duration.of(polling_time, ChronoUnit.MILLIS)); 
        
           if (consumerRecords.count() == 0) { 
        
             pollCount++; 
        
             if (pollCount > retry) { 
        
               break; 
        
             } 
        
           }

Can you give more information about the server instance you're running Airbyte and the collection size in Kafka, so the team can reproduce easily?

tunguyen-12 · 2022-07-05T01:47:36Z

Hi @marcosmarxm, Here is the information about my deployment

Airbyte version: example is 0.39.29-alpha
Deployment: Kubernetes
Source Connector and version: airbyte kafka 0.1.7
Destination Connector and version: airbyte bigquery 1.1.11
Step where error happened: Sync data from Kafka
**Kafk source: Produce 190 msg/sec, ~ 4 MB/sec => cause OutOfMemory

tunguyen-12 · 2022-07-05T01:52:51Z

@marcosmarxm Could you give me more details about the problem you said: Looks the connector is using other logic to stop the sync which can cause the problem
My Questions:

Can we batch here by number of records or bytes
does Worker batch data received from then source pod then flush to destination pod ?
Thanks

marcosmarxm · 2022-07-05T12:48:29Z

does Worker batch data received from then source pod then flush to destination pod ?

From the code pasted here looks the connector tries to read all records in the collection, stop after and then flush to destination. Which is not going to work for your use case with a fast message generation. Probably this is the reason of OOM

tunguyen-12 · 2022-07-05T13:21:10Z

My idea is reading by microbach, implement LazyIterator with abstract List load() function, when calling hasNext() if there available message in global queue return true if not call load() function

My concerns here will this cause OOM in worker ?

does Worker batch data received from then source pod then flush to destination pod ?

From the code pasted here looks the connector tries to read all records in the collection, stop after and then flush to destination. Which is not going to work for your use case with a fast message generation. Probably this is the reason of OOM

marcosmarxm · 2022-07-05T16:47:50Z

I think it's safer compared to logic implemented today.

tunguyen-12 · 2022-07-08T03:06:27Z

I think it's safer compared to logic implemented today.

Yeah, I have read the debezium flow and I applied the logic for Kafka flow and it works like a charm, I'll consider to submit a PR in near future.

obastemur · 2022-10-10T05:55:06Z

IMHO, record count limit has few short comings which prevent utilizing an expectedly long running task.

Feels like this could be managed with tracking total fetch size (not the number of records) if the problem is OOM ? Still, cutting the job short is an issue.

alexnikitchuk · 2022-10-28T22:35:16Z

Is there any workaround for the issue?
I have 10Gb topic which Airbyte fails to read with error:

Terminating due to java.lang.OutOfMemoryError: Java heap space

Moreover, It commits offsets but does not write data into the destination.

alexnikitchuk · 2022-10-29T15:25:59Z

Is there any workaround for the issue? I have 10Gb topic which Airbyte fails to read with error:
Terminating due to java.lang.OutOfMemoryError: Java heap space
Moreover, It commits offsets but does not write data into the destination.

I thought that limiting max records per run can help but looks like if we use JSON format this parameter does not work properly: record_counter increment is present in AVRO format but missing in JSON
FYI @sivankumar86

tunguyen-12 added needs-triage type/bug Something isn't working labels Jul 4, 2022

octavia-squidington-iii added team/triage autoteam community team/tse Technical Support Engineers and removed team/triage labels Jul 4, 2022

marcosmarxm added area/databases connectors/source/kafka team/databases and removed needs-triage team/tse Technical Support Engineers autoteam labels Jul 4, 2022

grishick added the team/db-dw-sources Backlog for Database and Data Warehouse Sources team label Sep 27, 2022

grishick removed the team/databases label Oct 7, 2022

alexnikitchuk mentioned this issue Oct 29, 2022

🐛 Source Kafka - Add missing record_count increment for JSON #18648

Merged

37 tasks

marcosmarxm changed the title ~~Kafka Source Cause OutOfMemory~~ Source Kafka: polling records cause OutOfMemory Nov 30, 2022

bleonard added the frozen Not being actively worked on label Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source Kafka: polling records cause OutOfMemory #14382

Source Kafka: polling records cause OutOfMemory #14382

tunguyen-12 commented Jul 4, 2022

marcosmarxm commented Jul 4, 2022

tunguyen-12 commented Jul 5, 2022 •

edited

Loading

tunguyen-12 commented Jul 5, 2022

marcosmarxm commented Jul 5, 2022

tunguyen-12 commented Jul 5, 2022 •

edited

Loading

marcosmarxm commented Jul 5, 2022

tunguyen-12 commented Jul 8, 2022

obastemur commented Oct 10, 2022

alexnikitchuk commented Oct 28, 2022

alexnikitchuk commented Oct 29, 2022 •

edited

Loading

Source Kafka: polling records cause OutOfMemory #14382

Source Kafka: polling records cause OutOfMemory #14382

Comments

tunguyen-12 commented Jul 4, 2022

Environment

Current Behavior

Expected Behavior

Logs

Steps to Reproduce

Are you willing to submit a PR?

marcosmarxm commented Jul 4, 2022

tunguyen-12 commented Jul 5, 2022 • edited Loading

tunguyen-12 commented Jul 5, 2022

marcosmarxm commented Jul 5, 2022

tunguyen-12 commented Jul 5, 2022 • edited Loading

marcosmarxm commented Jul 5, 2022

tunguyen-12 commented Jul 8, 2022

obastemur commented Oct 10, 2022

alexnikitchuk commented Oct 28, 2022

alexnikitchuk commented Oct 29, 2022 • edited Loading

tunguyen-12 commented Jul 5, 2022 •

edited

Loading

tunguyen-12 commented Jul 5, 2022 •

edited

Loading

alexnikitchuk commented Oct 29, 2022 •

edited

Loading