Flink: Fix IcebergSource tableloader lifecycle management in batch mode #9173

mas-chen · 2023-11-28T23:14:36Z

This is to fix a connection pool issue that prevents IcebergSource to startup in batch mode. The root cause is that the FileIO is closed when the tableLoader is closed via the try-with-resources statements. We need multiple copies of the user tableLoader parameter to retrieve the table name and do split planning and to close the loader after each use. The fix is to utilize clone the table loader to that it can be opened/closed by the batch split planning mechanism. This patch has been manually verified.

mas-chen · 2023-11-28T23:15:08Z

cc: @stevenzwu

mas-chen · 2023-11-29T19:04:23Z

flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java

    ExecutorService workerPool =
        ThreadPools.newWorkerPool(threadName, scanContext.planParallelism());
-    try {
+    plannerTableLoader.open();


I forgot to rename this variable earlier

pvary · 2023-11-30T09:22:38Z

flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java

+    // Create a copy of the table loader to avoid lifecycle management conflicts with the user
+    // provided table loader. This copy is only required for split planning, which uses the
+    // underlying io, and should be closed after split planning is complete


Why not clone/open/close it here?

Just to keep it the same as ContinuousSplitPlannerImpl and tableName(), which perform the open/close themselves. and for the ContinuousSplitPlannerImpl we can't close it until the source is closed.

Would it be more readable if the logic that requires a clone--abstract the clone, open, and then close? I think that's also fine but the original table loader needs to be closed by the source at the end of initialization

I usually prefer if the objects own the whole lifecycle of the child objects.

So ideally:

new ContinuousSplitPlannerImpl should clone the loader itself, and keep it as long as it needs, and closes it at the end of the reading

planSplitsForBatch should clone the loader itself, and keep it as long as it needs, and closes it at the end of the reading

For tableName, I am a bit confused. In planSplitsForBatch we need to clone the loader to do the planning, but use the old loader to get the tableName for logging? Why not set the tableName value in the constructor, and forget the whole lazyTable thing?

I agree with the pattern.

wrt to the lazytable thing, this is because the Table needed to be a transient value. But it seems we do not make a reference to it outside of the lazytable, so I will remove it.

pvary · 2023-12-05T12:05:21Z

flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java

      assigner = assignerFactory.createAssigner(enumState.pendingSplits());
    }
-
+    TableLoader tableLoaderCopy = tableLoader.clone();


stevenzwu · 2023-12-05T16:56:40Z

flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java

    this.emitter = emitter;
+    this.tableName = table.name();


this is incorrect. table can be null and loaded lazily.

iceberg/flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java

Line 482 in 8b7a280

if (table == null) {

The builder method already handles the nullity case. I added a preconditions to verify that table is non-null before invoking the source constructor.

We can move checkRequired into the constructor too (I think that makes the code more readable)

Previously, users don't have to provide a valid table. this is a breaking change that requires users to supply a valid table object

Isn't this part of the code takes care of creating the table?

iceberg/flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java

Lines 481 to 489 in 820fc3c

public IcebergSource<T> build() {

if (table == null) {

try (TableLoader loader = tableLoader) {

loader.open();

this.table = tableLoader.loadTable();

} catch (IOException e) {

throw new UncheckedIOException(e);

}

}

Thanks for pointing out the code from the build() method. yeah, this is fine. I missed it earlier

Now I understand Mason's comment. totally agree that it is more readable to move the checkRequired() inside the constructor. now it is detached.

stevenzwu

can't assume table is not null

pvary

LGTM, Thanks!

pvary · 2023-12-09T08:07:14Z

Merged to main.
Thanks for the PR @mas-chen and @stevenzwu for the review!

pvary · 2023-12-09T08:08:07Z

@mas-chen: Please do not forget to port the changes to the other Flink versions.
Thanks, Peter

…de (apache#9173)

javrasya · 2024-01-08T23:25:55Z

Why not supporting this for 1.15? Unfortunately that is still the latest version one can use on managed AWS Flink service.

pvary · 2024-01-09T08:19:36Z

@javrasya: The general policy is that we support the last 3 Flink release in the Flink connectors. We follow the same pattern with the Iceberg connector, and support Flink 1.18/1.17/1.16 in the upcoming Iceberg releases.

If you think that you would need this in older Iceberg releases, like 1.14.4 (if there will be any), then it should be backported to 1.4.x branch.

…de (apache#9173)

github-actions bot added the flink label Nov 28, 2023

stevenzwu self-requested a review November 29, 2023 05:47

stevenzwu approved these changes Nov 29, 2023

View reviewed changes

mas-chen force-pushed the flink-tableloader-mgmt branch from c51d51b to a26a471 Compare November 29, 2023 19:03

mas-chen commented Nov 29, 2023

View reviewed changes

stevenzwu approved these changes Nov 29, 2023

View reviewed changes

pvary reviewed Nov 30, 2023

View reviewed changes

pvary reviewed Dec 5, 2023

View reviewed changes

stevenzwu reviewed Dec 5, 2023

View reviewed changes

stevenzwu requested changes Dec 5, 2023

View reviewed changes

Flink: Fix IcebergSource tableloader lifecycle management in batch mode

4ca21d3

mas-chen force-pushed the flink-tableloader-mgmt branch from bd0dd16 to 4ca21d3 Compare December 8, 2023 21:02

stevenzwu approved these changes Dec 8, 2023

View reviewed changes

pvary approved these changes Dec 9, 2023

View reviewed changes

pvary merged commit 4d0b69b into apache:main Dec 9, 2023
13 checks passed

This was referenced Dec 11, 2023

streaming update jasonf20/iceberg#1

Closed

streaming update jasonf20/iceberg#2

Closed

mas-chen added a commit to mas-chen/iceberg that referenced this pull request Dec 18, 2023

Flink: backport apache#9173 to v1.16

cf42ead

mas-chen added a commit to mas-chen/iceberg that referenced this pull request Dec 18, 2023

Flink: backport apache#9173 to v1.16

8892b6b

mas-chen added a commit to mas-chen/iceberg that referenced this pull request Dec 18, 2023

Flink: backport apache#9173 to v1.16

6b060c6

mas-chen added a commit to mas-chen/iceberg that referenced this pull request Dec 18, 2023

Flink: port apache#9173 to v1.16 and v1.18

8b874e8

pvary pushed a commit that referenced this pull request Dec 19, 2023

Flink: port #9173 to v1.16 and v1.18 (#9334)

b1bf416

lisirrx pushed a commit to lisirrx/iceberg that referenced this pull request Jan 4, 2024

Flink: Fix IcebergSource tableloader lifecycle management in batch mo…

fe11aad

…de (apache#9173)

lisirrx pushed a commit to lisirrx/iceberg that referenced this pull request Jan 4, 2024

Flink: port apache#9173 to v1.16 and v1.18 (apache#9334)

fbd2b6c

mas-chen mentioned this pull request Jan 8, 2024

java.lang.IllegalStateException: Connection pool shut down when refreshing table metadata on s3 #8601

Closed

geruh pushed a commit to geruh/iceberg that referenced this pull request Jan 26, 2024

Flink: port apache#9173 to v1.16 and v1.18 (apache#9334)

2968d1d

devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024

Flink: Fix IcebergSource tableloader lifecycle management in batch mo…

104d307

…de (apache#9173)

devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024

Flink: port apache#9173 to v1.16 and v1.18 (apache#9334)

19d20c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flink: Fix IcebergSource tableloader lifecycle management in batch mode #9173

Flink: Fix IcebergSource tableloader lifecycle management in batch mode #9173

mas-chen commented Nov 28, 2023

mas-chen commented Nov 28, 2023

mas-chen Nov 29, 2023

pvary Nov 30, 2023

mas-chen Dec 1, 2023

pvary Dec 4, 2023

mas-chen Dec 4, 2023

pvary Dec 5, 2023

stevenzwu Dec 5, 2023

mas-chen Dec 5, 2023 •

edited

Loading

stevenzwu Dec 7, 2023

pvary Dec 8, 2023

stevenzwu Dec 8, 2023

stevenzwu Dec 8, 2023

stevenzwu left a comment

pvary left a comment

pvary commented Dec 9, 2023

pvary commented Dec 9, 2023

javrasya commented Jan 8, 2024

pvary commented Jan 9, 2024

	public IcebergSource<T> build() {
	if (table == null) {
	try (TableLoader loader = tableLoader) {
	loader.open();
	this.table = tableLoader.loadTable();
	} catch (IOException e) {
	throw new UncheckedIOException(e);
	}
	}

Flink: Fix IcebergSource tableloader lifecycle management in batch mode #9173

Flink: Fix IcebergSource tableloader lifecycle management in batch mode #9173

Conversation

mas-chen commented Nov 28, 2023

mas-chen commented Nov 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mas-chen Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevenzwu left a comment

Choose a reason for hiding this comment

pvary left a comment

Choose a reason for hiding this comment

pvary commented Dec 9, 2023

pvary commented Dec 9, 2023

javrasya commented Jan 8, 2024

pvary commented Jan 9, 2024

mas-chen Dec 5, 2023 •

edited

Loading