Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ensure that during resumption of a scan, rows that have not been observed by the caller are re-requested #1444

Conversation

danieljbruce
Copy link
Contributor

@danieljbruce danieljbruce commented Jul 10, 2024

Took #1440:

createReadStream() creates a pipeline of streams that converts a stream of row chunks into a stream of logical rows. It also has logic to handle stream resumption when a single attempt fails. The pipeline can be split into 2 parts: the persistent operation stream that the caller sees and the transient per attempt segment. When a retry attempt occurs, the per attempt segment is unpiped from the operation stream and is discarded. Currently this includes any buffered data that each stream might contain. Unfortunately, when constructing the retry request, createReadStream() will use the last row key from the last buffered row. This will cause the buffered rows to be omitted from the operation stream.

This PR fixes the missing rows part by only referencing the row keys that were seen by the persistent operation stream when constructing a retry attempt. In other words, this will ensure that we only update the lastSeenRow key once the row has been "committed" to the persistent portion of the pipeline

If my understanding is correct, this should be sufficient to fix the correctness issue. However the performance issue of re-requesting the dropped buffered data remains. This should be addressed separately

Plus:

Added the test that guarantees 150 rows are all sent back and in the right order.
Modified tests to make mocks schedule last event emitted later.
Fixed test to work with Node v14
Removed watermarks

igorbernstein2 and others added 20 commits July 9, 2024 18:10
createReadStream() creates a pipeline of streams that converts a stream of row chunks into a stream of logical rows. It also has logic to handle stream resumption when a single attempt fails.
The pipeline can be split into 2 parts: the persistent operation stream that the caller sees and the transient per attempt segment. When a retry attempt occurs, the per attempt segment is unpiped from the operation stream and is discarded. Currently this includes any buffered data that each stream might contain. Unfortunately, when constructing the retry request, createReadStream() will use the last row key from the last buffered row. This will cause the buffered rows to be omitted from the operation stream.

This PR fixes the missing rows part by only referencing the row keys that were seen by the persistent operation stream when constructing a retry attempt. In other words, this will ensure that we only update the lastSeenRow key once the row has been "committed" to the persistent portion of the pipeline
# Conflicts:
#	system-test/read-rows.ts
# Conflicts:
#	system-test/read-rows.ts
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigtable Issues related to the googleapis/nodejs-bigtable API. labels Jul 10, 2024
src/index.ts Outdated Show resolved Hide resolved
src/table.ts Outdated Show resolved Hide resolved
src/table.ts Outdated Show resolved Hide resolved
test/readrows.ts Outdated Show resolved Hide resolved
test/readrows.ts Outdated Show resolved Hide resolved
test/readrows.ts Outdated Show resolved Hide resolved
test/readrows.ts Outdated Show resolved Hide resolved
test/table.ts Outdated Show resolved Hide resolved
test/utils/readRowsImpl2.ts Show resolved Hide resolved
@danieljbruce danieljbruce added the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 11, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 11, 2024
@danieljbruce danieljbruce marked this pull request as ready for review July 11, 2024 15:51
@danieljbruce danieljbruce requested review from a team as code owners July 11, 2024 15:51
@danieljbruce danieljbruce added the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 11, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 11, 2024
@danieljbruce danieljbruce added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 11, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 11, 2024
test/readrows.ts Outdated
@@ -317,6 +318,42 @@ describe('Bigtable/ReadRows', () => {
});
});

it('should return row data in the right order', done => {
// 1000 rows must be enough to reproduce issues with losing the data and to create backpressure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment about backpressure still true given that the highwatermark is set to 0? if not, please remove, otherwise, feel free to resolve

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still true because we want to create the scenario where there is backpressure in the chunk transformer and other streams in order to reproduce the issue that occurs when these transforms are thrown away from before the fix. Note that this fix only applies a highwatermark of 0 to the user stream.

However, this comment still does need an adjustment from 1000 to 150 :)

test/readrows.ts Outdated Show resolved Hide resolved
test/readrows.ts Outdated Show resolved Hide resolved
test/utils/readRowsImpl2.ts Show resolved Hide resolved
@danieljbruce danieljbruce changed the title fix: Fix missing rows with test and fix for node 14 plus watermark removal fix: ensure that during resumption of a scan, rows that have not been observed by the caller are re-requested Jul 11, 2024
Co-authored-by: Leah E. Cole <[email protected]>
}
let keyToRequestClosed: any;
if (
stream?.request?.rows?.rowRanges &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is pretty difficult to read... I looked at the file and there are 33 if statements! All this branching makes it difficult to follow the logic.

I would suggest we refactor this code to take a page from OOP. Can we group if statements into classes, or at least their own separate pieces of functionality so that instead we call functionality we know will occur? From a cursory glance (I'm not really sure what the code does in this library) it looks like we're concerned with startKey and endkeys. Maybe we could group startKey functions and endKey functions into their own classes or (at least) pieces of logic, and then always call these pieces of logic. Those functions should appropriately handle edge/non-validated cases, that way we can easily read what happens to an input and what to get as an output.

TL;DR: if statements are hard to follow. If we can somehow group/parcel out functionality to reduce overall if-statements that would be ideal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all good ideas. Server code was taken from https://github.com/googleapis/nodejs-bigtable/blob/main/test/utils/readRowsImpl.ts and adjusted to mock correct server behaviour in a hurry. I added some TODOs because I think that will take me some time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we defer cleaning up the test code until after we land the fix?
I think this entire generator can be simplified significantly but I really dont want us to block a fix to a data loss bug due to test code hygiene. Daniel already added a TODO comment to simplify this

@danieljbruce danieljbruce added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 11, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 11, 2024
@danieljbruce danieljbruce added the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 11, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 11, 2024
Copy link
Contributor

@leahecole leahecole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval, please create an issue for the cleanup concerns @sofisl and I have

@danieljbruce danieljbruce merged commit 2d8de32 into googleapis:main Jul 11, 2024
15 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/nodejs-bigtable API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants