feat: `PersistedShapeStream` with generic storage interface #1630

msfstef · 2024-09-04T16:30:19Z

Addresses #1519

Followed by #1519 for more refactoring and improvements, split into two PRs for reviewability

Introduces a PersistedShapeStream - follows the newly defined ShapeStreamInterface so it can be used interchangeably, but requires:

A storage option to be provided, implementing the ShapeStreamStorage interface, which is a key-value store oriented interface
To not provide an offset and shapeId as the persisted shape stream cannot resume from an arbitrary offset and shapeId, it uses the persisted ones (although I suppose we could make it wipe the underlying store if an offset and shpaeId are provided to simulate the exact same behaviour)

What it does is it subscribes to the underlying ShapeStream and compacts entries to the provided store, very similar to what Shape does. When someone subscribes to the PersistedShapeStream, it looks into the store and returns the compacted entries as inserts and resumes from there.

One tricky bit with a bit of a hacky solution is the persistence of the offset and shapeId, needed to actually resume the stream. We could store them in a separate metadata store entry, but that might require either 1) custom key prefixes on the store to separate metadata and 2) deal with inconsistent state.

For example, if metadata is stored separately from the operations and the store has no transactional guarantees (which the interface assumes), an operation might be stored with offset 10_3, and the metadata fails to get stored because the app shuts down - so upon resuming the 10_3 operation is fed to the subscriber but then the stream is resumed from the previous latest offset that we managed to persist (e.g. 10_2), duplicating the change.

The workaround is that the latest stored offset can be retrieved by reading into the stored entries and comparing their offsets (assuming the offset is not opaque here, but we can do that by adding a timestamp otherwise) and using the most recent one - has an overhead of needing to read the whole store to resume but is foolproof and can also do consistency checks to decide if it actually can resume.

The shapeId I've stored alongside the entries, but really only in one entry stored, such that when we iterate over the data to get the latest offset we will eventually find the shapeId as well, but without needing to store it fully in every entry. Depending on the actual storage implementation this approach might be effective or slightly annoying. My main concern was to 1) avoid dealing with consistency with non-transactional stores and 2) not introduce separate storage APIs that people need to conform to.

If someone wants to use a transactional storage engine, they might as well go to PGlite with the sync plugin - I believe the use case for this is simpler key-value stores like localStorage etc.

As part of the work done for #1630 I had to introduce a `ShapeStreamInterface`, and generally make the client slightly more modular in its utilities. I've split out a bunch of things used and added unit tests for most (found some issues such as the `fetchWithBackoff` never terminating). I've also refactored the classes to use the [proper private fields](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Classes/Private_properties) in modern JS. --------- Co-authored-by: Kyle Mathews <[email protected]>

kevin-dp

This PR introduces many changes which makes it hard to review and merge.

I'm mainly concerned with the way how the code changed to handle synchronous and asynchronous callbacks differently. The message processor used to handle every callback as an asynchronous callback (also synchronous ones). As a result, the code was clear and concise. However, this PR handles sync callbacks differently (mainly to avoid JS pushing synchronous continuations on the microtask queue). As a result, code is written in a continuation passing style everywhere which makes it harder to understand.

I had a brief discussion with @icehaunter which shared the concern and would also stick with await for now. If performance is a concern here, we should benchmark a sync storage with the old implementation (that always awaits) and with the optimized implementation that doesn't await.

kevin-dp · 2024-09-06T08:05:43Z

packages/typescript-client/src/async-or.ts

@@ -0,0 +1,44 @@
+import { PromiseOr } from './types'


Can we move the contents of this file to the helpers.ts file?

kevin-dp · 2024-09-06T08:15:51Z

packages/typescript-client/src/async-or.ts

@@ -0,0 +1,44 @@
+import { PromiseOr } from './types'
+
+export function isPromise<T>(promise: PromiseOr<T>): promise is Promise<T> {


nit: a more accurate name would be isThenable as any object with a then method will pass this check.
If you actually only want to test for promise, all you need is promise instanceof Promise.

kevin-dp · 2024-09-06T08:25:10Z

packages/typescript-client/src/async-or.ts

+  )
+}
+
+export function asyncOrCall<T>(


I find this name slightly confusing. What about apply ?

kevin-dp · 2024-09-06T08:32:15Z

packages/typescript-client/src/client.ts

+export class ShapeStream<T extends Row = Row>
+  implements ShapeStreamInterface<T>
+{
+  readonly #options: ShapeStreamOptions


i wouldn't make the options private, especially since they are read only so there's no harm in making them public for introspectability.

kevin-dp · 2024-09-06T08:44:26Z

packages/typescript-client/src/types.ts

@@ -108,3 +108,5 @@ export type TypedMessages<T extends Row = Row> = {
  messages: Array<Message<T>>
  schema: ColumnInfo
 }
+
+export type PromiseOr<T> = T | Promise<T>


nit: not a fan of the name. What do you think about MaybePromise<T> ?

kevin-dp · 2024-09-06T08:55:03Z

packages/typescript-client/src/offset.ts

+export function compareOffset(offsetA: Offset, offsetB: Offset): 1 | 0 | -1 {
+  const [oAx, oAy] = splitOffset(offsetA)
+  const [oBx, oBy] = splitOffset(offsetB)
+  if (oAx > oBx) return 1


Sorting functions in JS can handle comparator functions that return values other than -1, 0, and 1.
This is very convenient for defining such comparator functions:

const [a1, a2] = splitOffset(offsetA) const [b1, b2] = splitOffset(offsetB) return a1 - b1 || a2 - b2

(i also renamed the variables because i found them hard to read)

kevin-dp · 2024-09-06T09:05:27Z

packages/typescript-client/src/queue.ts

+  #processingChain: PromiseOr<void> = undefined
+
+  public process(callback: () => PromiseOr<void>): PromiseOr<void> {
+    this.#processingChain = asyncOrCall(


Why not always process async like before, using await even if the callback is not async?
That reduces the complexity of the code and the indirections to the helper functions such as asyncOrCall.

kevin-dp · 2024-09-06T09:14:40Z

packages/typescript-client/src/queue.ts

+    this.#callback = callback
+  }
+
+  public process(messages: T): void {


i'd keep it as it was before with the loop that took the next one out of the queue and applied it async: await this.#callback(messages).

kevin-dp · 2024-09-06T09:50:48Z

packages/typescript-client/src/persist.ts

+  subscribe(
+    callback: (messages: Message<T>[]) => PromiseOr<void>,
+    onError?: (error: FetchError | Error) => void
+  ): () => void {


This method and #hydrateStream warrant some comments explaining what is happening as it is non-trivial to understand with the callbacks being passed around.

KyleAMathews · 2024-09-08T17:42:33Z

an operation might be stored with offset 10_3, and the metadata fails to get stored because the app shuts down - so upon resuming the 10_3 operation is fed to the subscriber but then the stream is resumed from the previous latest offset that we managed to persist (e.g. 10_2), duplicating the change.

We could also filter out the extra operations to avoid the duplicates.

That seems better so that on loading we can immediately send the sync call and not have to wait to load all the old data first.

KyleAMathews · 2024-09-09T02:59:00Z

How about we put this on the back burner for some time? Being able to persist the shape log will be a nice feature at some point but it's not particularly urgent until we get a lot more users and some that definitely need this. Most users won't need this and I don't think it's worth a lot of time to benchmark and fine-tune this — especially if it requires a big refactor.

@kevin-dp

Extracted out refactoring from #1630 ~~Had to also add a `subscribeSync` method on `ShapeStream` otherwise we cannot guarantee that `Shape` is up to date when the stream is up to date, since the processing is done asynchronously. We have three options:~~ ~~1. We re-introduce the `PromiseOr` abstractions in order for the single `subscribe` call to handle both sync and async~~ ~~2. We introduce an explicit `subcsribeSync` like I have done here, which provides strong guarantees on up-to-date synchronicity~~ ~~3. We change `Shape` to maintain it's own "up to date" logic as it is not guaranteed that when the stream is up to date that the `Shape` will have materialized its local state fully, since processing is async~~ Reintroduced thin version of sync + async handling in the queue to maintain the current API and intended behaviour. I've discussed with @kevin-dp and we think that actually making the processing callbacks create backpressure on the stream rather than the stream collecting the responses non-stop might be a better API and allow the developers to choose the pace at which the stream will collect data through the choice of processing callback. Will open a new PR for this, so we can merge this one with the refactor.

KyleAMathews · 2024-10-08T16:41:30Z

This is pretty out-of-date w/ main & not a priority so I'll close it for now.

msfstef requested review from kevin-dp and KyleAMathews September 5, 2024 08:03

msfstef marked this pull request as ready for review September 5, 2024 08:03

msfstef changed the title ~~[WIP] feat: PersistedShapeStream with generic storage interface~~ feat: PersistedShapeStream with generic storage interface Sep 5, 2024

msfstef added 10 commits September 5, 2024 11:27

Add interface to implement persister

da96db8

Add async or possibilities

112b075

Fix types on persisted stream

61f7765

Test persistence

43c316d

Add offset comparison utility

07f8873

Add changeset

7967cbd

Use PromiseOr type everywhere

d15d46b

Move asyncOr utilities to own file

02f6225

Add tests for async or utilitites

e227452

Add package export

9f3da50

msfstef force-pushed the msfstef/shape-stream-persister branch from dd5da2d to 9f3da50 Compare September 5, 2024 08:27

msfstef mentioned this pull request Sep 5, 2024

chore: Client refactoring #1634

Merged

msfstef and others added 3 commits September 5, 2024 14:49

Fix persister for async storage

b49ab53

Reduce async latency

f2b6cbe

kevin-dp mentioned this pull request Sep 6, 2024

feat: PersistedShapeStream with generic storage interface #1638

Closed

kevin-dp requested changes Sep 6, 2024

View reviewed changes

msfstef mentioned this pull request Sep 16, 2024

chore: Refactor client #1711

Merged

KyleAMathews closed this Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: `PersistedShapeStream` with generic storage interface #1630

feat: `PersistedShapeStream` with generic storage interface #1630

msfstef commented Sep 4, 2024 •

edited

Loading

kevin-dp left a comment

kevin-dp Sep 6, 2024

kevin-dp Sep 6, 2024

kevin-dp Sep 6, 2024

kevin-dp Sep 6, 2024

kevin-dp Sep 6, 2024

kevin-dp Sep 6, 2024

kevin-dp Sep 6, 2024

kevin-dp Sep 6, 2024

kevin-dp Sep 6, 2024

KyleAMathews commented Sep 8, 2024

KyleAMathews commented Sep 9, 2024

KyleAMathews commented Oct 8, 2024

		@@ -0,0 +1,44 @@
		import { PromiseOr } from './types'

		export function isPromise<T>(promise: PromiseOr<T>): promise is Promise<T> {

feat: PersistedShapeStream with generic storage interface #1630

feat: PersistedShapeStream with generic storage interface #1630

Conversation

msfstef commented Sep 4, 2024 • edited Loading

kevin-dp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KyleAMathews commented Sep 8, 2024

KyleAMathews commented Sep 9, 2024

KyleAMathews commented Oct 8, 2024

feat: `PersistedShapeStream` with generic storage interface #1630

feat: `PersistedShapeStream` with generic storage interface #1630

msfstef commented Sep 4, 2024 •

edited

Loading