Reinstate CheckpointAsync #616

bartelink · 2019-08-01T02:22:28Z

Is your feature request related to a problem? Please describe.

I've decoupled my consumption from my checkpointing (i.e., I don't necessarily synchronously process all change feed items, instead letting reading get ahead of the checkpointable position in order that I can manage buffering and retries for performance)

Describe the solution you'd like

Provide an overload that exposes something akin to the IChangeFeedProcessorContext.CheckpointAsync method which the v2 CFP API formerly exposed (it's present but internal atm)

Describe alternatives you've considered

Only alternative is to use the v2 CFP SDK, but that closes off tonnes of options and is not tenable from my perspective.

Additional context

I flagged this to @ealsur some time ago in the context of some other work in this repo. While Port to Azure.Cosmos / v3 SDK jet/equinox#144 illustrates that the V3 SDK provides some very nice cleanup in general for a relatively complex use case, not being able to port the CFP aspect easily presents a problem in the medium term with having client teams adopt the V3 SDK.
Consumption code
- shows how we use (and need, CheckpointAsync)
- illustrates that in general, the V3 API is good - it will cut out a lot of boilerplate wrapping we formerly had to provide.
- never lets an exception escape from the processor observer function, avoiding the fact that the CFP logic does not retry and/or otherwise correctly handle exceptions.
- illustrates the level of instrumentation (i.e. context re logging Range Ids etc) which cannot be exposed at the present time due to the absence of IChangeFeedProcessorContext (see related: Provide partitionid to ChangeFeedObserverFactory.Create #400)
Document parsing code
- this code cannot be directly ported due to the fact that the Document class has become internal in V3 - it'd be nice to have a sample illustrate how one might most cheaply probe documents to determine whether they are parseable as a given type as was formerly possible.

The text was updated successfully, but these errors were encountered:

ealsur · 2019-08-02T17:17:06Z

@bartelink Is your request to add support for Manual Checkpointing?

bartelink · 2019-08-02T18:43:34Z

Yes, in essence; that's the key facility that's missing compared to the CFP v2 API.

To do the full port, I will also need a way to do an equivalent of GetPropertyValue and a way to identify the partition/range id associated with each set of Documents from the changefeed so I can control the max read ahead on a per range basis.

bartelink · 2019-10-04T00:26:26Z

I really want start migrating various apps of ours to the V3 SDK wrt CFP logic. I really can't use it as it is. Is there any rough roadmap re when there'll be space to consider this?

ealsur · 2019-10-04T00:52:42Z

The CheckpointAsync method I'm not sure there will be a point if we move forward with enabling a higher degree of parallelization beyond the partition (ie, multiple threads can be reading the same partition, each thread with a different range of partition key values within the same partition). Won't this higher degree of processing solve the need for buffering?

Exposing the partition information won't happen, as this is something that is not exposed anywhere in the V3 SDK. At max, we could think about exposing the LeaseToken for context. While it currently works as 1 lease 1 partition, we are looking into expanding that as mentioned in the first paragraph, so it cannot be inferred from the LeaseToken.

Regarding your Document point, you have multiple options:

You could use dynamic, or JObject, or the class of your choice
You could use your own custom Serializer to manage the deserialization process.

bartelink · 2019-10-05T01:34:25Z

CheckpointAsync ... Won't this higher degree of processing solve the need for buffering?

That depends on exactly what you have up your sleeves ;) The benefits of being able to decouple checkpointing from read/process/write cycle include:

not waiting for roundtrips to (what can be a highly contended) aux container
being able to overlap transmitting, parsing and filtering of data with processing
being able to maintain balanced throughput in the face of an item early in a sequence of batches which takes inordinately long to process (let's say I'm processing a reaction to something which happens to hit rate limiting due to a hotspot)
if one has algorithms that benefit from grouping and/or deduplicating by reading ahead, they can't succeed if only one batch from a given logical subset can ever be read at a time
if you restrict by max items, you can get very lumpy amounts of data, i.e. actual number of events and/or 'weight' of a batch can vary quite a bit
if you restrict by the proposed 'max RUs' limit, one can end up with a low number of items

Unfortunately, I could go on. The V2 API provides a very powerful scheme; archiving homegrown solutions was possible as a direct result. Can you share some more information as to the design of this scheme please in order to allay my concerns? While it was not my first choice to end up implementing a scheme leaning on this facility, it ultimately provides a very high throughput facility which would be a significant loss.

Inferring things from LeaseTokens definitely does not interest me. My desire for information as to which partition a received batch comes from arises from:

it's in V2
its very valuable to see if some partition is stalled from a troubleshooting/diagnostics perspective
when reading ahead, you don't want to get too far ahead on any individual partition in the interests of fairness
In short, if I have 100 partitions and this processes is covering 33 of them, I want to a) know b) be able to control my progress on all 33 of them, not just through indirect means like the Estimator.

Thanks for the serialization suggestions. Might I suggest the CFP migration example show Document parsing vs doing that with Dynamic (have not tried to attack it or looked at the code, but it would seem that it would be relatively easy yet valuable to demo?)?

bartelink · 2020-02-11T19:19:35Z

@ealsur still really interested on this - we're looking to move to V4 but can't even get off V2 until this API comes back

ealsur · 2020-02-11T23:39:29Z

This is coming back in V3, right now we got jumped by high-pri work. We want to have that and Context and Estimator with all leases for March.

ylibrach · 2020-03-19T16:50:03Z

Hi @ealsur , just wanted to check if ETA is still sometime in March?

ealsur · 2020-03-19T18:33:29Z

Sadly it got a bit pushed back but prioritized still. We are working to release Change Feed pull support and this will be worked right after.

ylibrach · 2020-04-28T19:48:47Z

@ealsur, aside from bringing CheckpointAsync into v3 and v4, I'm also curious as to what is the current path for creating a ChangeFeedProcessor in v4? It seems that ChangeFeedProcessorBuilder<T> has been made internal in v4.

ealsur · 2020-04-28T21:28:27Z

V4 won't have Change Feed Processor for the time being (short time), we are working on the base API surface. V4 is not ready for production and it needs first to pass review of base APIs, once base APIs are approved, we can start to onboard features.

ylibrach · 2020-04-28T21:37:30Z

I see. By "short time", what do you estimate?

bartelink · 2020-05-17T15:09:52Z

Now that the RU cost discrepancies in V3 are finally resolved, the most significant blocker for moving to V3 for transactional processing has been removed from my perspective.

This brings this Issue back into focus for me; @ealsur

are you still thinking V3 is the likely release vehicle for the return of CheckpointAsync ?
alongside that, the context in the batch delivery is also a gap

For my roadmap purposes, indicative dates are naturally always welcome, but I guess I'm also wondering when all 3 concerns ([RU consumption]((#990 (comment)), the reintroduction of CheckpointAsync, reintroduction of Context info) will be covered in a single release

I'm ready to validate all of these in the context of V4 when they land.

ealsur · 2020-05-18T04:50:41Z

Once Change Feed pull model goes GA, I have one PR to enable Estimator per lease, and then another PR to introduce Checkpoint. The blocker for all this is GA of Change Feed pull model.

bartelink · 2020-08-08T14:46:38Z

Any chance of a quick update on how this is all looking in terms of dependencies?

We've been long-fingering various CFP issues on the basis that we'll be moving from V2 to V3 within a reasonable timeframe.

(CheckpointAsync and source context for delivered batches are the critical items of interest that represent blockers)

This was referenced Aug 1, 2019

Port to Azure.Cosmos / v3 SDK jet/equinox#144

Merged

Port to Azure.Cosmos / V3 SDK jet/propulsion#15

Closed

bartelink mentioned this issue Aug 10, 2019

Using SqlQuerySpec with CosmosClient #663

Closed

This was referenced Sep 5, 2019

Cosmos ChangeFeedOptions? #766

Closed

Adding Change Feed Processor code migration sample #782

Merged

Provide partitionid to ChangeFeedObserverFactory.Create #400

Closed

ealsur added the ChangeFeed label Sep 23, 2019

bartelink mentioned this issue Oct 28, 2019

Change feed processor cannot read from the beginning #938

Closed

bartelink mentioned this issue Dec 28, 2019

FeedIterator Diagnostics: Provide ability to determine LeaseToken along with batches #1122

Closed

bartelink mentioned this issue Feb 11, 2020

Added Stream support to ChangeFeedProcessor #888

Closed

3 tasks

ealsur self-assigned this Feb 11, 2020

This was referenced Apr 1, 2020

Lease by Physical Partition Key Range Azure/azure-documentdb-changefeedprocessor-dotnet#148

Open

[Internal] PR Requirements: Add PR title requirements to allow changelog generation #1322

Merged

bartelink mentioned this issue May 17, 2020

feat!(CosmosStore): Azure.Cosmos 'V4' support / enable mocking Cosmos APIs jet/equinox#197

Draft

16 tasks

bartelink mentioned this issue Jun 18, 2020

Cosmos: Dispose queries/results if releasing against Cosmos SDK V3 instead of V4 jet/equinox#225

Closed

bartelink mentioned this issue Aug 8, 2020

feat(Query, ChangeFeed): Provide ability to specify max Request Charge budget #1763

Open

ealsur mentioned this issue Aug 10, 2020

Change Feed Processor: Add support for manual checkpoint #1765

Closed

bartelink mentioned this issue Sep 28, 2020

Rename/reorg .Cosmos to .CosmosStore jet/equinox#243

Merged

ealsur mentioned this issue Mar 23, 2021

[Preview] ChangeFeedProcessor: Adds support for manual checkpoint, context, and stream #2331

Merged

1 task

ealsur closed this as completed in #2331 Apr 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reinstate CheckpointAsync #616

Reinstate CheckpointAsync #616

bartelink commented Aug 1, 2019 •

edited

Loading

ealsur commented Aug 2, 2019

bartelink commented Aug 2, 2019

bartelink commented Oct 4, 2019 •

edited

Loading

ealsur commented Oct 4, 2019

bartelink commented Oct 5, 2019

bartelink commented Feb 11, 2020

ealsur commented Feb 11, 2020

ylibrach commented Mar 19, 2020

ealsur commented Mar 19, 2020

ylibrach commented Apr 28, 2020

ealsur commented Apr 28, 2020

ylibrach commented Apr 28, 2020

bartelink commented May 17, 2020

ealsur commented May 18, 2020

bartelink commented Aug 8, 2020

Reinstate CheckpointAsync #616

Reinstate CheckpointAsync #616

Comments

bartelink commented Aug 1, 2019 • edited Loading

ealsur commented Aug 2, 2019

bartelink commented Aug 2, 2019

bartelink commented Oct 4, 2019 • edited Loading

ealsur commented Oct 4, 2019

bartelink commented Oct 5, 2019

bartelink commented Feb 11, 2020

ealsur commented Feb 11, 2020

ylibrach commented Mar 19, 2020

ealsur commented Mar 19, 2020

ylibrach commented Apr 28, 2020

ealsur commented Apr 28, 2020

ylibrach commented Apr 28, 2020

bartelink commented May 17, 2020

ealsur commented May 18, 2020

bartelink commented Aug 8, 2020

bartelink commented Aug 1, 2019 •

edited

Loading

bartelink commented Oct 4, 2019 •

edited

Loading