fix: ensure that during resumption of a scan, rows that have not been observed by the caller are re-requested #1444

danieljbruce · 2024-07-10T19:43:11Z

createReadStream() creates a pipeline of streams that converts a stream of row chunks into a stream of logical rows. It also has logic to handle stream resumption when a single attempt fails. The pipeline can be split into 2 parts: the persistent operation stream that the caller sees and the transient per attempt segment. When a retry attempt occurs, the per attempt segment is unpiped from the operation stream and is discarded. Currently this includes any buffered data that each stream might contain. Unfortunately, when constructing the retry request, createReadStream() will use the last row key from the last buffered row. This will cause the buffered rows to be omitted from the operation stream.

This PR fixes the missing rows part by only referencing the row keys that were seen by the persistent operation stream when constructing a retry attempt. In other words, this will ensure that we only update the lastSeenRow key once the row has been "committed" to the persistent portion of the pipeline

If my understanding is correct, this should be sufficient to fix the correctness issue. However the performance issue of re-requesting the dropped buffered data remains. This should be addressed separately

Plus:

Added the test that guarantees 150 rows are all sent back and in the right order.
Modified tests to make mocks schedule last event emitted later.
Fixed test to work with Node v14
Removed watermarks

createReadStream() creates a pipeline of streams that converts a stream of row chunks into a stream of logical rows. It also has logic to handle stream resumption when a single attempt fails. The pipeline can be split into 2 parts: the persistent operation stream that the caller sees and the transient per attempt segment. When a retry attempt occurs, the per attempt segment is unpiped from the operation stream and is discarded. Currently this includes any buffered data that each stream might contain. Unfortunately, when constructing the retry request, createReadStream() will use the last row key from the last buffered row. This will cause the buffered rows to be omitted from the operation stream. This PR fixes the missing rows part by only referencing the row keys that were seen by the persistent operation stream when constructing a retry attempt. In other words, this will ensure that we only update the lastSeenRow key once the row has been "committed" to the persistent portion of the pipeline

# Conflicts: # system-test/read-rows.ts

src/index.ts

src/table.ts

test/readrows.ts

test/table.ts

test/utils/readRowsImpl2.ts

leahecole · 2024-07-11T17:11:31Z

test/readrows.ts

@@ -317,6 +318,42 @@ describe('Bigtable/ReadRows', () => {
    });
  });

+  it('should return row data in the right order', done => {
+    // 1000 rows must be enough to reproduce issues with losing the data and to create backpressure


is this comment about backpressure still true given that the highwatermark is set to 0? if not, please remove, otherwise, feel free to resolve

It's still true because we want to create the scenario where there is backpressure in the chunk transformer and other streams in order to reproduce the issue that occurs when these transforms are thrown away from before the fix. Note that this fix only applies a highwatermark of 0 to the user stream.

However, this comment still does need an adjustment from 1000 to 150 :)

test/readrows.ts

test/utils/readRowsImpl2.ts

Co-authored-by: Leah E. Cole <[email protected]>

sofisl · 2024-07-11T17:52:38Z

test/utils/readRowsImpl2.ts

+    }
+    let keyToRequestClosed: any;
+    if (
+      stream?.request?.rows?.rowRanges &&


This file is pretty difficult to read... I looked at the file and there are 33 if statements! All this branching makes it difficult to follow the logic.

I would suggest we refactor this code to take a page from OOP. Can we group if statements into classes, or at least their own separate pieces of functionality so that instead we call functionality we know will occur? From a cursory glance (I'm not really sure what the code does in this library) it looks like we're concerned with startKey and endkeys. Maybe we could group startKey functions and endKey functions into their own classes or (at least) pieces of logic, and then always call these pieces of logic. Those functions should appropriately handle edge/non-validated cases, that way we can easily read what happens to an input and what to get as an output.

TL;DR: if statements are hard to follow. If we can somehow group/parcel out functionality to reduce overall if-statements that would be ideal.

These are all good ideas. Server code was taken from https://github.com/googleapis/nodejs-bigtable/blob/main/test/utils/readRowsImpl.ts and adjusted to mock correct server behaviour in a hurry. I added some TODOs because I think that will take me some time.

Can we defer cleaning up the test code until after we land the fix?
I think this entire generator can be simplified significantly but I really dont want us to block a fix to a data loss bug due to test code hygiene. Daniel already added a TODO comment to simplify this

…ermark-removal' of https://github.com/danieljbruce/nodejs-bigtable into fix-missing-rows-with-test-and-fix-for-node-14-plus-watermark-removal

Use it to replace any

leahecole

Approval, please create an issue for the cleanup concerns @sofisl and I have

igorbernstein2 and others added 20 commits July 9, 2024 18:10

Add a test that sends rows back

2a4d928

# Conflicts: # system-test/read-rows.ts

Create an instance with dummy server.

d8e2e91

# Conflicts: # system-test/read-rows.ts

Add test parameters for sending back the right chu

d803684

Omit server start

a8bebb1

Run a test against the server in the old code

fdc5c0d

Add logging to the chunk transformer

80f4fca

Add logging to indicate that the server received r

580d65f

chunk transformer logs and other logs

b08288a

Add a log

e4d9b86

Don’t change the old server

9d41781

Add another guard against the logs

2d9b603

Add setImmediate everywhere that it needs to be

5bad7c0

Remove the logging

e61343d

Remove more logging

138d105

Adjust header

f35fc3e

Add the high watermarks back in

bf74092

Remove the at accessor

ff13036

Eliminate the watermark adjustments

87c57a9

Introduce the watermarks back in

09d9155

product-auto-label bot added size: l Pull request size is large. api: bigtable Issues related to the googleapis/nodejs-bigtable API. labels Jul 10, 2024

Reduce the number of watermark removals to 1.

89c7c53

igorbernstein2 requested changes Jul 10, 2024

View reviewed changes

danieljbruce added the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 11, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 11, 2024

danieljbruce added 4 commits July 11, 2024 10:57

Reverted the streamEvents pipeline

f60e790

Add some comments for introducing the new waterma

d07aa9e

Remove comments and console logs. Add TODO.

537760b

Add TODO

96f98d7

Adding a comment about the mock

52675d1

danieljbruce marked this pull request as ready for review July 11, 2024 15:51

danieljbruce requested review from a team as code owners July 11, 2024 15:51

readable comment change

c62ed72

danieljbruce added the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 11, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 11, 2024

danieljbruce added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 11, 2024

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 11, 2024

leahecole requested changes Jul 11, 2024

View reviewed changes

igorbernstein2 mentioned this pull request Jul 11, 2024

fix: dropping buffered rows during a retry of a scan WIP #1440

Closed

4 tasks

danieljbruce changed the title ~~fix: Fix missing rows with test and fix for node 14 plus watermark removal~~ fix: ensure that during resumption of a scan, rows that have not been observed by the caller are re-requested Jul 11, 2024

This was referenced Jul 11, 2024

fix: Fix missing rows with test and fix for node 14 #1443

Closed

fix: Fix missing rows with test plus more fixes and watermarks #1442

Closed

Update test/readrows.ts

d78d119

Co-authored-by: Leah E. Cole <[email protected]>

igorbernstein2 approved these changes Jul 11, 2024

View reviewed changes

It uses 150 rows not 1000 rows

8c0726c

sofisl reviewed Jul 11, 2024

View reviewed changes

danieljbruce added 3 commits July 11, 2024 14:06

Add a TODO for making more specific typing

eba2195

Add some TODOs for better factoring

0a86df7

Merge branch 'fix-missing-rows-with-test-and-fix-for-node-14-plus-wat…

85901e5

…ermark-removal' of https://github.com/danieljbruce/nodejs-bigtable into fix-missing-rows-with-test-and-fix-for-node-14-plus-watermark-removal

danieljbruce added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 11, 2024

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 11, 2024

Add interface: server writable stream

88236d1

Use it to replace any

danieljbruce added the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 11, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jul 11, 2024

leahecole approved these changes Jul 11, 2024

View reviewed changes

danieljbruce merged commit 2d8de32 into googleapis:main Jul 11, 2024
15 of 19 checks passed

release-please bot mentioned this pull request Jul 11, 2024

chore(main): release 5.1.1 #1446

Merged

danieljbruce mentioned this pull request Jul 29, 2024

fix: Update last row key in write function of user stream to avoid data loss and data duplication #1459

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ensure that during resumption of a scan, rows that have not been observed by the caller are re-requested #1444

fix: ensure that during resumption of a scan, rows that have not been observed by the caller are re-requested #1444

danieljbruce commented Jul 10, 2024 •

edited

Loading

leahecole Jul 11, 2024

danieljbruce Jul 11, 2024

sofisl Jul 11, 2024

danieljbruce Jul 11, 2024

igorbernstein2 Jul 11, 2024

leahecole left a comment

fix: ensure that during resumption of a scan, rows that have not been observed by the caller are re-requested #1444

fix: ensure that during resumption of a scan, rows that have not been observed by the caller are re-requested #1444

Conversation

danieljbruce commented Jul 10, 2024 • edited Loading

leahecole Jul 11, 2024

Choose a reason for hiding this comment

danieljbruce Jul 11, 2024

Choose a reason for hiding this comment

sofisl Jul 11, 2024

Choose a reason for hiding this comment

danieljbruce Jul 11, 2024

Choose a reason for hiding this comment

igorbernstein2 Jul 11, 2024

Choose a reason for hiding this comment

leahecole left a comment

Choose a reason for hiding this comment

danieljbruce commented Jul 10, 2024 •

edited

Loading