-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CCR] Changed AutoFollowCoordinator to keep track of certain statistics #33684
[CCR] Changed AutoFollowCoordinator to keep track of certain statistics #33684
Conversation
The following stats are being kept track of: 1) The total number of times that auto following a leader index succeed. 2) The total number of times that auto following a leader index failed. 3) The total number of times that fetching a remote cluster state failed. 4) The most recent 256 auto follow failures per auto leader index (e.g. create_and_follow api call fails) or cluster alias (e.g. fetching remote cluster state fails). Each auto follow run now produces a result that is being used to update the stats being kept track of in AutoFollowCoordinator. Relates to elastic#33007
Pinging @elastic/es-distributed |
\cc @jasontedor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martijnvg This looks good. I left some comments.
x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/action/AutoFollowCoordinator.java
Show resolved
Hide resolved
LOGGER.warn("failure occurred during auto-follower coordination", e); | ||
Consumer<List<AutoFollowResult>> handler = results -> { | ||
for (AutoFollowResult result : results) { | ||
if (result.clusterStateFetchException != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flow is quite similar to updateStats
. Should we combine these to a single method?
private final ClusterState followerClusterState; | ||
private final AutoFollowMetadata autoFollowMetadata; | ||
|
||
private final CountDown autoFollowPatternsCountDown; | ||
private final AtomicReference<Exception> autoFollowPatternsErrorHolder = new AtomicReference<>(); | ||
private final AtomicArray<AutoFollowResult> clusterAliasResults; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just call it autoFollowResults
?
for (Index indexToFollow : leaderIndicesToFollow) { | ||
final AtomicArray<Tuple<Index, Exception>> results = new AtomicArray<>(leaderIndicesToFollow.size()); | ||
for (int i = 0; i < leaderIndicesToFollow.size(); i++) { | ||
final int slot = i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we can use another data-structure (instead of AtomicArray) to avoid passing the slot-index around. It may be less error-prone because we now have two slot-indexes (clusterAliasSlot and slot) in handleClusterAlias
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it would be better if we just do not pass the clusterAliasSlot
around. Let me see if this would be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is better now: 90fb198 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOGGER.debug("Successfully marked leader index [{}] as auto followed", leaderIndexName); | ||
} | ||
if (leaderIndicesCountDown.countDown()) { | ||
finalise(leaderIndicesErrorHolder.get()); | ||
finalise(clusterAliasSlot, new AutoFollowResult(clusterAlias, results.asList())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert that every slot is assigned?
if (autoFollowPatternsCountDown.countDown()) { | ||
handler.accept(autoFollowPatternsErrorHolder.get()); | ||
handler.accept(clusterAliasResults.asList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert that every slot is assigned?
|
||
AutoFollowResult(String clusterAlias) { | ||
this.clusterAlias = clusterAlias; | ||
this.clusterStateFetchException = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just delegate to this(clusterAlias, null)
?
public class AutoFollowStats implements Writeable, ToXContentObject { | ||
|
||
private static final ParseField NUMBER_OF_SUCCESSFUL_INDICES_AUTO_FOLLOWED = | ||
new ParseField("number_of_successful_indices_auto_followed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if indices_auto_followed
is a right term. @jasontedor WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just number_of_successful_followed_indices
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I like this name. And I think we don't need ed
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed: 3a15fe6
…method, this avoids handleClusterAlias() having to know about clusterAliasSlot parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @martijnvg.
getLeaderIndicesToFollow(autoFollowPattern, leaderClusterState, followerClusterState, followedIndices); | ||
if (leaderIndicesToFollow.isEmpty()) { | ||
finalise(slot, new AutoFollowResult(clusterAlias)); | ||
}else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a space before else
.
…cs (#33684) The following stats are being kept track of: 1) The total number of times that auto following a leader index succeed. 2) The total number of times that auto following a leader index failed. 3) The total number of times that fetching a remote cluster state failed. 4) The most recent 256 auto follow failures per auto leader index (e.g. create_and_follow api call fails) or cluster alias (e.g. fetching remote cluster state fails). Each auto follow run now produces a result that is being used to update the stats being kept track of in AutoFollowCoordinator. Relates to #33007
The following stats are being kept track of:
(e.g. create_and_follow api call fails) or cluster alias
(e.g. fetching remote cluster state fails).
Each auto follow run now produces a result that is being used to update
the stats being kept track of in AutoFollowCoordinator.
The transport and rest actions are added in a follow up PR.
Relates to #33007