Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-2935: Avoid double close of ParquetFileWriter #2951

Merged
merged 2 commits into from
Jul 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ public static enum Mode {

// set when end is called
private ParquetMetadata footer = null;
private boolean closed;

private final CRC32 crc;
private final ReusingByteBufferAllocator crcAllocator;
Expand Down Expand Up @@ -1658,11 +1659,16 @@ public void end(Map<String, String> extraMetaData) throws IOException {

@Override
public void close() throws IOException {
if (closed) {
return;
}
try (PositionOutputStream temp = out) {
temp.flush();
if (crcAllocator != null) {
crcAllocator.close();
}
} finally {
closed = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put this in the finally block just in case of any exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting it in finally block makes it be not retryable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting it in finally block makes it be not retryable.

That was also my initial thought. I was actually torn between these two choices, but I suppose it's rare for people to actually retry failed close operation. Feedback is welcome.

Copy link

@doki23 doki23 Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's rare for people to actually retry failed close operation

I agree with you, it's rare.
But if someone retries, it will not work as expected -- file is not closed as expected. This is not user friendly. And if people don't retry, it's no matter where we put it in, right? But it will also lead to another problem -- the external finally block will throw an exception that suppresses the original exception.

I also find InteralParquetRecordWriter sets closed to true in the finally block too. But if AutoCloseables.uncheckedClose throws an exception, it'll keep false.

  public void close() throws IOException, InterruptedException {
    if (!closed) {
      try {
        ......
      } finally {
        AutoCloseables.uncheckedClose(columnStore, pageStore, bloomFilterWriteStore, parquetFileWriter);
        closed = true;
      }
    }
  }

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that try {} finally { closed = true; } is common in parquet writer code base, so it' ok to follow up.

}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,9 @@ public void testWriteRead() throws Exception {
w.endColumn();
w.endBlock();
w.end(new HashMap<String, String>());
// Although writer is already closed in previous end(),
// explicitly close it again to verify double close behavior.
w.close();
Fokko marked this conversation as resolved.
Show resolved Hide resolved

ParquetMetadata readFooter = ParquetFileReader.readFooter(configuration, path);
assertEquals("footer: " + readFooter, 2, readFooter.getBlocks().size());
Expand Down