Spec and Implement the end of life of a shard. #4056

fulmicoton · 2023-10-31T06:29:04Z

We need to clean up data from the metastore, from the ingester and from the control plane.

The indexer reports in the metastore and in chitchat that it has reached EOF.

On publish the indexer updates the shard position to ~eof. The list of shards is part of the control plane view but not their position: We do not want publish to require the control plane to be up.
the indexer also publish its EOF status on chitchat, which is eventually caught up by the control loop control_running_plan routine.
the control plan attempts to remove the data from the ingester. This call may fail. (best effort / optional)
The control plane removes the data from the metastore
The control plane removes the data from its internal model.

If the ingester is unreachable, there is no way to know whether it will ever come back or not.
We should therefore have a GC like process to tell an ingester the list of shards it is supposed to host, and the ingester should remove the diff.

If a shard never reaches the status of EOF because all of its replicas have been disconnected, then we also need a TTL based solution to remove it. After this removal, the data is lost forever.

creation of shard positions service that create an evntually consistent view of the published position of shards, connected to the event broker.
hooking the indexing service to the shard positions service
hooking the ingesters to the shard positions, so that they can truncate their queues when they receive an update.
hooking the control plane to delete shards from the metastore when EOF is detected
have a code path to entirely delete queues from ingesters

The text was updated successfully, but these errors were encountered:

Its purpose is to encapsulate all of the logic used to maintained a distributed eventually consistent view of the published shard positions over the cluster. From the user point of view, after instantiation - indexing pipelines need to feed it with updates. (this happens on suggest_truncate). This is done by publishing `LocalShardPositionsUpdates` to the event broker. - clients interested in updates can just subscript the `ShardPositionsUpdate` object in the event broker. The event received can come from a local indexing pipeline or anywhere in the cluster. The service takes care of deduping/ignoring updates when necessary. The two object (Local and not) are very similar, but different in semantics. Preliminary step for #4056

Closes #4056

Related to #4056

Its purpose is to encapsulate all of the logic used to maintained a distributed eventually consistent view of the published shard positions over the cluster. From the user point of view, after instantiation - indexing pipelines need to feed it with updates. (this happens on suggest_truncate). This is done by publishing `LocalShardPositionsUpdates` to the event broker. - clients interested in updates can just subscript the `ShardPositionsUpdate` object in the event broker. The event received can come from a local indexing pipeline or anywhere in the cluster. The service takes care of deduping/ignoring updates when necessary. The two object (Local and not) are very similar, but different in semantics. Related to #4056

fulmicoton added the enhancement New feature or request label Oct 31, 2023

fulmicoton self-assigned this Oct 31, 2023

fulmicoton mentioned this issue Nov 17, 2023

Make the ShardPositions model into a ShardPositionsService actor #4156

Merged

fulmicoton added a commit that referenced this issue Nov 21, 2023

Plugging the shard cleanup to the control plane.

9cabcd0

Closes #4056

fulmicoton added a commit that referenced this issue Nov 21, 2023

Plugging the shard cleanup to the control plane.

2a54676

Closes #4056

fulmicoton added a commit that referenced this issue Nov 21, 2023

Plugging the shard cleanup to the control plane.

17e911c

Closes #4056

fulmicoton added a commit that referenced this issue Nov 21, 2023

Plugging the shard cleanup to the control plane.

ac10b84

Related to #4056

fulmicoton added a commit that referenced this issue Nov 21, 2023

Plugging the shard cleanup to the control plane.

e14c5da

Related to #4056

guilload closed this as completed Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec and Implement the end of life of a shard. #4056

Spec and Implement the end of life of a shard. #4056

fulmicoton commented Oct 31, 2023 •

edited by guilload

Loading

Spec and Implement the end of life of a shard. #4056

Spec and Implement the end of life of a shard. #4056

Comments

fulmicoton commented Oct 31, 2023 • edited by guilload Loading

fulmicoton commented Oct 31, 2023 •

edited by guilload

Loading