Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec and Implement the end of life of a shard. #4056

Closed
5 tasks done
fulmicoton opened this issue Oct 31, 2023 · 0 comments
Closed
5 tasks done

Spec and Implement the end of life of a shard. #4056

fulmicoton opened this issue Oct 31, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@fulmicoton
Copy link
Contributor

fulmicoton commented Oct 31, 2023

We need to clean up data from the metastore, from the ingester and from the control plane.

The indexer reports in the metastore and in chitchat that it has reached EOF.

  • On publish the indexer updates the shard position to ~eof. The list of shards is part of the control plane view but not their position: We do not want publish to require the control plane to be up.
  • the indexer also publish its EOF status on chitchat, which is eventually caught up by the control loop control_running_plan routine.
  • the control plan attempts to remove the data from the ingester. This call may fail. (best effort / optional)
  • The control plane removes the data from the metastore
  • The control plane removes the data from its internal model.

If the ingester is unreachable, there is no way to know whether it will ever come back or not.
We should therefore have a GC like process to tell an ingester the list of shards it is supposed to host, and the ingester should remove the diff.

If a shard never reaches the status of EOF because all of its replicas have been disconnected, then we also need a TTL based solution to remove it. After this removal, the data is lost forever.

  • creation of shard positions service that create an evntually consistent view of the published position of shards, connected to the event broker.
  • hooking the indexing service to the shard positions service
  • hooking the ingesters to the shard positions, so that they can truncate their queues when they receive an update.
  • hooking the control plane to delete shards from the metastore when EOF is detected
  • have a code path to entirely delete queues from ingesters
@fulmicoton fulmicoton added the enhancement New feature or request label Oct 31, 2023
@fulmicoton fulmicoton self-assigned this Oct 31, 2023
fulmicoton added a commit that referenced this issue Nov 17, 2023
Its purpose is to encapsulate all of the logic used to maintained a
distributed eventually consistent view of the published shard positions
over the cluster.

From the user point of view, after instantiation
- indexing pipelines need to feed it with updates. (this happens on
  suggest_truncate). This is done by publishing
  `LocalShardPositionsUpdates` to the event broker.
- clients interested in updates can just subscript the
  `ShardPositionsUpdate` object in the event broker. The event
  received can come from a local indexing pipeline or anywhere in the
  cluster.

The service takes care of deduping/ignoring updates when necessary.

The two object (Local and not) are very similar, but different in
semantics.

Preliminary step for #4056
fulmicoton added a commit that referenced this issue Nov 17, 2023
Its purpose is to encapsulate all of the logic used to maintained a
distributed eventually consistent view of the published shard positions
over the cluster.

From the user point of view, after instantiation
- indexing pipelines need to feed it with updates. (this happens on
  suggest_truncate). This is done by publishing
  `LocalShardPositionsUpdates` to the event broker.
- clients interested in updates can just subscript the
  `ShardPositionsUpdate` object in the event broker. The event
  received can come from a local indexing pipeline or anywhere in the
  cluster.

The service takes care of deduping/ignoring updates when necessary.

The two object (Local and not) are very similar, but different in
semantics.

Preliminary step for #4056
fulmicoton added a commit that referenced this issue Nov 21, 2023
Its purpose is to encapsulate all of the logic used to maintained a
distributed eventually consistent view of the published shard positions
over the cluster.

From the user point of view, after instantiation
- indexing pipelines need to feed it with updates. (this happens on
  suggest_truncate). This is done by publishing
  `LocalShardPositionsUpdates` to the event broker.
- clients interested in updates can just subscript the
  `ShardPositionsUpdate` object in the event broker. The event
  received can come from a local indexing pipeline or anywhere in the
  cluster.

The service takes care of deduping/ignoring updates when necessary.

The two object (Local and not) are very similar, but different in
semantics.

Preliminary step for #4056
fulmicoton added a commit that referenced this issue Nov 21, 2023
fulmicoton added a commit that referenced this issue Nov 21, 2023
fulmicoton added a commit that referenced this issue Nov 28, 2023
Its purpose is to encapsulate all of the logic used to maintained a
distributed eventually consistent view of the published shard positions
over the cluster.

From the user point of view, after instantiation
- indexing pipelines need to feed it with updates. (this happens on
  suggest_truncate). This is done by publishing
  `LocalShardPositionsUpdates` to the event broker.
- clients interested in updates can just subscript the
  `ShardPositionsUpdate` object in the event broker. The event
  received can come from a local indexing pipeline or anywhere in the
  cluster.

The service takes care of deduping/ignoring updates when necessary.

The two object (Local and not) are very similar, but different in
semantics.

Related to #4056
fulmicoton added a commit that referenced this issue Nov 28, 2023
Its purpose is to encapsulate all of the logic used to maintained a
distributed eventually consistent view of the published shard positions
over the cluster.

From the user point of view, after instantiation
- indexing pipelines need to feed it with updates. (this happens on
  suggest_truncate). This is done by publishing
  `LocalShardPositionsUpdates` to the event broker.
- clients interested in updates can just subscript the
  `ShardPositionsUpdate` object in the event broker. The event
  received can come from a local indexing pipeline or anywhere in the
  cluster.

The service takes care of deduping/ignoring updates when necessary.

The two object (Local and not) are very similar, but different in
semantics.

Related to #4056
fulmicoton added a commit that referenced this issue Nov 28, 2023
Its purpose is to encapsulate all of the logic used to maintained a
distributed eventually consistent view of the published shard positions
over the cluster.

From the user point of view, after instantiation
- indexing pipelines need to feed it with updates. (this happens on
  suggest_truncate). This is done by publishing
  `LocalShardPositionsUpdates` to the event broker.
- clients interested in updates can just subscript the
  `ShardPositionsUpdate` object in the event broker. The event
  received can come from a local indexing pipeline or anywhere in the
  cluster.

The service takes care of deduping/ignoring updates when necessary.

The two object (Local and not) are very similar, but different in
semantics.

Related to #4056
fulmicoton added a commit that referenced this issue Nov 28, 2023
Its purpose is to encapsulate all of the logic used to maintained a
distributed eventually consistent view of the published shard positions
over the cluster.

From the user point of view, after instantiation
- indexing pipelines need to feed it with updates. (this happens on
  suggest_truncate). This is done by publishing
  `LocalShardPositionsUpdates` to the event broker.
- clients interested in updates can just subscript the
  `ShardPositionsUpdate` object in the event broker. The event
  received can come from a local indexing pipeline or anywhere in the
  cluster.

The service takes care of deduping/ignoring updates when necessary.

The two object (Local and not) are very similar, but different in
semantics.

Related to #4056
fulmicoton added a commit that referenced this issue Nov 29, 2023
Its purpose is to encapsulate all of the logic used to maintained a
distributed eventually consistent view of the published shard positions
over the cluster.

From the user point of view, after instantiation
- indexing pipelines need to feed it with updates. (this happens on
  suggest_truncate). This is done by publishing
  `LocalShardPositionsUpdates` to the event broker.
- clients interested in updates can just subscript the
  `ShardPositionsUpdate` object in the event broker. The event
  received can come from a local indexing pipeline or anywhere in the
  cluster.

The service takes care of deduping/ignoring updates when necessary.

The two object (Local and not) are very similar, but different in
semantics.

Related to #4056
fulmicoton added a commit that referenced this issue Nov 29, 2023
Its purpose is to encapsulate all of the logic used to maintained a
distributed eventually consistent view of the published shard positions
over the cluster.

From the user point of view, after instantiation
- indexing pipelines need to feed it with updates. (this happens on
  suggest_truncate). This is done by publishing
  `LocalShardPositionsUpdates` to the event broker.
- clients interested in updates can just subscript the
  `ShardPositionsUpdate` object in the event broker. The event
  received can come from a local indexing pipeline or anywhere in the
  cluster.

The service takes care of deduping/ignoring updates when necessary.

The two object (Local and not) are very similar, but different in
semantics.

Related to #4056
@guilload guilload closed this as completed Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants