[Monitoring/Telemetry] Force collectors to indicate when they are ready #36153

chrisronline · 2019-05-07T02:52:14Z

Resolves #35799

Some recent changes in the stack monitoring parity tests and the way we collect usage data within monitoring reintroduced the fact that we currently have a fair amount of nondeterminism in the way our bulk uploader and /api/stats endpoint fetch data from collectors. It's true that some collectors, notably the maps, visualizations, and reporting usage collectors as well as the ops stats collector, are not ready to report their data synchronously, but rather, need to wait for some async processes to finish before they are ready.

Historically, this has not been a problem because on the next collection interval tick, we'd just try and collect from those usage collectors again and eventually we'd get the data. However, recent changes to our collector code are now ensuring we only collect from usage collectors once a day (it happens with the very first call to collect), instead of at the default monitoring collection interval. Because of this, we ran into a scenario where our parity tests fail because the very first internally collected monitoring document is fairly bare (see #35799) which is the result of certain collectors not being ready to report yet, and the bulk uploader or /api/stats endpoint having no idea this is the case.

This PR fixes this. It requires that every collector (usage or stats) implements a custom async isReady() function that returns true or false. If any known collector is not ready, the bulk uploader will not send its payload to ES, and the /api/stats endpoint will return a 503. To avoid conflicts with the recent changes with usage collection, if any collector is not ready when we try and collect, we effectively reset the flag to once again try and fetch from usage collectors. This shouldn't affect the performance benefits introduced by #34609 because the isReady() check will not actually invoke the fetching of usage collectors.

It's important to note that it was a conscious decision to introduce extra friction by requiring each collector implement it's own isReady() function. Every owner needs to really think about if it needs to implement this function with custom logic, or just return true.

For this PR, I have updated each collector and implemented custom logic in the few we identified as needing it, but I also need each owner of the other collectors to weigh in if we need to apply custom logic or not. Hopefully, the team tagging feature in Github will properly identify all owners, but I will do a pass to ensure everyone is notified here.

Testing

The easiest way to test this is to ensure that no .monitoring-kibana-7-* documents are lacking the fields noted in #35799, but most of the effort will be isolated to each owners specific usage data and ensuring it's in each and every .monitoring-kibana-7-* document. This isn't meant to be time consuming, as it should only really affect the few first documents indexed once Kibana starts, but feel free to be as complete as you feel necessary.

Questions/Concerns

It's theoretically possible that we will no longer reporting any monitoring documents if some collector is stuck in a not ready state - we should probably have a way out of this situation

Suggested Reviewers

To all suggested reviewers, if able, please verify that no special logic is necessary for your collector to be ready to collect the necessary data. Also, ensure that the collector is registered ASAP to avoid any timing issues where the first usage collection doesn't include your usage collector since it wasn't registered yet due to async code happening beforehand (this was true with the reporting code @tsullivan)

elasticmachine · 2019-05-07T04:11:22Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

elasticmachine · 2019-05-07T15:03:58Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

elasticmachine · 2019-05-07T15:30:59Z

Pinging @elastic/stack-monitoring

elasticmachine · 2019-05-07T16:07:04Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

…or_is_ready

mistic

The changes for the upgrade assistant collector LGTM

src/legacy/server/status/routes/api/register_stats.js

src/legacy/server/usage/classes/collector.js

elasticmachine · 2019-05-07T18:08:42Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

elasticmachine · 2019-05-14T03:23:07Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

chrisronline · 2019-05-14T12:00:45Z

retest

elasticmachine · 2019-05-14T13:11:28Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

chrisronline · 2019-05-14T13:40:26Z

retest

elasticmachine · 2019-05-14T14:59:03Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

chrisronline · 2019-05-14T14:59:37Z

retest

elasticmachine · 2019-05-14T16:12:40Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

chrisronline · 2019-05-19T15:35:01Z

retest

…re runtime

elasticmachine · 2019-05-19T17:14:05Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

legrego

Spaces changes LGTM!

chrisronline · 2019-05-20T13:24:23Z

@crob611 It looks like you approved the PR in a comment, but would you mind going through the official approval phase so the kibana-canvas requirement is satisfied? Thanks!

crob611

Canvas looks good

…dy (elastic#36153) * Initial code to force collectors to indicate when they are ready * Add and fix tests * Remove debug * Add ready check in api call * Fix prettier complaints * Return 503 if not all collectors are ready * PR feedback * Add retry logic for usage collection in the reporting tests * Fix incorrect boomify usage * Fix more issues with the tests * Just add debug I guess * More debug * Try and handle this exception * Try and make the tests more defensive and remove console logs * Retry logic here too * Debug for the reporting tests failure * I don't like this, but lets see if it works * Move the retry logic into the collector set directly * Add support for this new collector * Localize this * This shouldn't be static on the class, but rather static for the entire runtime

…dy (#36153) (#36706) * Initial code to force collectors to indicate when they are ready * Add and fix tests * Remove debug * Add ready check in api call * Fix prettier complaints * Return 503 if not all collectors are ready * PR feedback * Add retry logic for usage collection in the reporting tests * Fix incorrect boomify usage * Fix more issues with the tests * Just add debug I guess * More debug * Try and handle this exception * Try and make the tests more defensive and remove console logs * Retry logic here too * Debug for the reporting tests failure * I don't like this, but lets see if it works * Move the retry logic into the collector set directly * Add support for this new collector * Localize this * This shouldn't be static on the class, but rather static for the entire runtime

chrisronline · 2019-05-20T20:40:06Z

Backport:

7.x: 2ac88ca
6.8: 12bbb5e

tsullivan · 2019-07-09T17:37:03Z

It's important to note that it was a conscious decision to introduce extra friction by requiring each collector implement it's own isReady() function. Every owner needs to really think about if it needs to implement this function with custom logic, or just return true.

👍

…re ready (#36153) (#41289) * Backport c87e881 to 6.8 * Fix tests * Add in missing functionality * Add more missing code

Initial code to force collectors to indicate when they are ready

c69b3ce

chrisronline added 5 commits May 7, 2019 09:18

Add and fix tests

4c9caac

Remove debug

fb45de0

Add ready check in api call

04c17d3

Fix prettier complaints

1d50908

Return 503 if not all collectors are ready

54c1d92

chrisronline marked this pull request as ready for review May 7, 2019 15:30

chrisronline requested review from a team as code owners May 7, 2019 15:30

chrisronline requested review from ycombinator and igoristic May 7, 2019 15:30

chrisronline self-assigned this May 7, 2019

chrisronline added Team:Monitoring Stack Monitoring team review v7.1.0 labels May 7, 2019

chrisronline added v8.0.0 v7.2.0 and removed v7.1.0 labels May 7, 2019

Merge remote-tracking branch 'elastic/master' into monitoring/collect…

7f761c0

…or_is_ready

mistic approved these changes May 7, 2019

View reviewed changes

ycombinator reviewed May 7, 2019

View reviewed changes

src/legacy/server/status/routes/api/register_stats.js Outdated Show resolved Hide resolved

ycombinator reviewed May 7, 2019

View reviewed changes

src/legacy/server/usage/classes/collector.js Outdated Show resolved Hide resolved

PR feedback

f27ced8

chrisronline added 2 commits May 19, 2019 11:59

Localize this

945ac46

This shouldn't be static on the class, but rather static for the enti…

2e6a650

…re runtime

legrego approved these changes May 20, 2019

View reviewed changes

crob611 approved these changes May 20, 2019

View reviewed changes

chrisronline merged commit c87e881 into elastic:master May 20, 2019

chrisronline deleted the monitoring/collector_is_ready branch May 20, 2019 17:04

chrisronline mentioned this pull request May 20, 2019

[7.x] [Monitoring/Telemetry] Force collectors to indicate when they are ready (#36153) #36706

Merged

This was referenced May 23, 2019

[Monitoring] Ops collector is inconsistent with its isReady implementation #36991

Closed

[Monitoring] Once the buffer has any events, the collector is always ready #36995

Merged

[Monitoring] Add tests for logic around waiting for usage collectors #37009

Open

This was referenced Jun 14, 2019

Stats collection throwing a warning in functional tests #38951

Closed

Stats collector changed to 1 #39001

Merged

chrisronline mentioned this pull request Jul 16, 2019

[6.8] [Monitoring/Telemetry] Force collectors to indicate when they are ready (#36153) #41289

Merged

chrisronline added a commit that referenced this pull request Jul 17, 2019

[6.8] [Monitoring/Telemetry] Force collectors to indicate when they a…

12bbb5e

…re ready (#36153) (#41289) * Backport c87e881 to 6.8 * Fix tests * Add in missing functionality * Add more missing code

chrisronline added the v6.8.4 label Sep 20, 2019

chrisronline mentioned this pull request Nov 3, 2020

[Telemetry] Usage collectors not using isReady correctly #81944

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Monitoring/Telemetry] Force collectors to indicate when they are ready #36153

[Monitoring/Telemetry] Force collectors to indicate when they are ready #36153

chrisronline commented May 7, 2019 •

edited by crob611

Loading

elasticmachine commented May 7, 2019

elasticmachine commented May 7, 2019

elasticmachine commented May 7, 2019

elasticmachine commented May 7, 2019

mistic left a comment

elasticmachine commented May 7, 2019

elasticmachine commented May 14, 2019

chrisronline commented May 14, 2019

elasticmachine commented May 14, 2019

chrisronline commented May 14, 2019

elasticmachine commented May 14, 2019

chrisronline commented May 14, 2019

elasticmachine commented May 14, 2019

chrisronline commented May 19, 2019

elasticmachine commented May 19, 2019

legrego left a comment

chrisronline commented May 20, 2019

crob611 left a comment

chrisronline commented May 20, 2019 •

edited

Loading

tsullivan commented Jul 9, 2019

[Monitoring/Telemetry] Force collectors to indicate when they are ready #36153

[Monitoring/Telemetry] Force collectors to indicate when they are ready #36153

Conversation

chrisronline commented May 7, 2019 • edited by crob611 Loading

Testing

Questions/Concerns

Suggested Reviewers

elasticmachine commented May 7, 2019

💔 Build Failed

elasticmachine commented May 7, 2019

💔 Build Failed

elasticmachine commented May 7, 2019

elasticmachine commented May 7, 2019

💔 Build Failed

mistic left a comment

Choose a reason for hiding this comment

elasticmachine commented May 7, 2019

💔 Build Failed

elasticmachine commented May 14, 2019

💚 Build Succeeded

chrisronline commented May 14, 2019

elasticmachine commented May 14, 2019

💚 Build Succeeded

chrisronline commented May 14, 2019

elasticmachine commented May 14, 2019

💚 Build Succeeded

chrisronline commented May 14, 2019

elasticmachine commented May 14, 2019

💚 Build Succeeded

chrisronline commented May 19, 2019

elasticmachine commented May 19, 2019

💚 Build Succeeded

legrego left a comment

Choose a reason for hiding this comment

chrisronline commented May 20, 2019

crob611 left a comment

Choose a reason for hiding this comment

chrisronline commented May 20, 2019 • edited Loading

tsullivan commented Jul 9, 2019

chrisronline commented May 7, 2019 •

edited by crob611

Loading

chrisronline commented May 20, 2019 •

edited

Loading