Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tablet throttler: post 18 refactoring, race condition fixes, unit & race testing, deprecation of HTTP checks #14181

Merged
merged 19 commits into from
Dec 13, 2023

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented Oct 4, 2023

Description

Followup to #13341

Addresses and fixes #14180. This PR is quite large and incorporates many changes; some planned in advance, some found while investigating #14164, and some on the go as race detections were added.

To make sense of the changes, most commits in this PR can be considered in isolation:

  • 0ac7b55 removes excessive use of goroutine, partly to fix a race condition, partly to simplify the code.
  • 6c728ff dereferences a typed struct, just by means of simplifying the code.
  • 23a1872 removes InstanceKey type; all probes are now identified by a tablet alias, and all access to tablets is done by tablt alias or tablet reference. Any hostname:port access (or information) is gone.
  • 2d7b093 removes the old HTTP-based probing. The throttler post v18 now only uses gRPC for inter-tablet checks.
  • 5e3bd0d uses more idiomatic create/destroy flow upon Open() & Close(). Things that don't need to operate when the throttler is Close()d now won't.
  • 0ecb51b adapts to recent golang atomic types.
  • 92edb62, 4cefacd, and 13ea301 introduce proper unit testing, with some emphasis on race detection. Some implied code fixes have been made due to race detection. Some minor refactoring to throttler code (fields, functions) so that unit testing is made possible.
  • 11cebfc updates LICENSEs.

Related Issue(s)

Fixes #14180

It also fixes issue #14178 for post-v18. However, a dedicated PR #14179 fixes #14178 for both post v18 as well as v18, v17, v16 backport.

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…s); exhaust Operate() channels (mostly for unit tests)

Signed-off-by: Shlomi Noach <[email protected]>
…sts); 'select' on ctx.Done() when writing to channels

Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Oct 4, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Oct 4, 2023
@github-actions github-actions bot added this to the v19.0.0 milestone Oct 4, 2023
@shlomi-noach shlomi-noach changed the title Tablet throttler: post 18 refactors and objectives Tablet throttler: post 18 refactoring, race condition fixes, unit & race testing, deprecation of HTTP checks Oct 4, 2023
Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach shlomi-noach requested review from mattlord and a team October 4, 2023 12:30
@shlomi-noach shlomi-noach removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Oct 5, 2023
@shlomi-noach shlomi-noach removed the NeedsIssue A linked issue is missing for this Pull Request label Oct 5, 2023
Signed-off-by: Shlomi Noach <[email protected]>
Copy link
Contributor

This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:

  • Push additional commits to the associated branch.
  • Remove the stale label.
  • Add a comment indicating why it is not stale.

If no action is taken within 7 days, this PR will be closed.

@github-actions github-actions bot added the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Nov 25, 2023
@shlomi-noach shlomi-noach removed the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Nov 26, 2023
@shlomi-noach
Copy link
Contributor Author

Request for review 🙏

Signed-off-by: Shlomi Noach <[email protected]>
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! ❤️ I only had minor comments/questions/nits so will leave it up to you how to best resolve.

CacheMillis int
QueryInProgress int64
}

// Probes maps instances to probe(s)
type Probes map[InstanceKey](*Probe)
// Probes maps tablet aliases to probe(s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's worth it or not, but we could use the *topodatapb.TabletAlias type as the key instead of string. String is fine too, IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a good point. It will be yet another somewhat significant change what with all the tests. I'm happy to give this a go in a followup PR, as this PR is already bloated as it is.

defer requestCancel()
throttlerConfig, err := throttler.readThrottlerConfig(requestCtx)
if err == nil {
log.Errorf("Throttler.retryReadAndApplyThrottlerConfig(): success reading throttler config: %+v", throttlerConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be INFO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

for {
if !throttler.IsOpen() {
// Throttler is not open so no need to keep retrying.
log.Errorf("Throttler.retryReadAndApplyThrottlerConfig(): throttler no longer seems to be open, exiting")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be INFO or maybe WARNING?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

select {
case <-ctx.Done():
// Throttler is not open so no need to keep retrying.
log.Errorf("Throttler.retryReadAndApplyThrottlerConfig(): throttler no longer seems to be open, exiting")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like WARNING to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 699 to 700
if throttler.IsOpen() {
if throttler.tabletTypeFunc() == topodatapb.TabletType_PRIMARY {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be slightly more readable if it was if throttler.IsOpen() && throttler.tabletTypeFunc() == topodatapb.TabletType_PRIMARY {

Copy link
Contributor Author

@shlomi-noach shlomi-noach Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewritten

Comment on lines 806 to 807
probe := probe
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's nicer to pass the value into the groutine function, but this is also fine. Personal pref I suppose, I just find it more clear/deliberate/obvious. Maybe we can just add a comment about the closure issue instead for future readers so they don't delete the line as "unneeded". :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewritten. The one thing I don't like about the pass-the-value-to-goroutine paradigm, is that the value is passed 20 lines of code below. Otherwise I recognize it's the dominant paradigm.

)

// TestProbesPostDisable runs the throttler for some time, and then investigates the internal throttler maps and values.
// While the therottle is disabled, it is technically safe to iterate those structures. However, `go test -race` disagrees,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: therottle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. I'm now working with a new spellchecker that is not as intrusive as others I've worked with, let's see how it improves my typos rate.

Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach shlomi-noach merged commit f64919b into vitessio:main Dec 13, 2023
116 checks passed
@shlomi-noach shlomi-noach deleted the throttler-post-18-refactor branch December 13, 2023 06:03
ejortegau pushed a commit to slackhq/vitess that referenced this pull request Dec 13, 2023
…ace testing, deprecation of HTTP checks (vitessio#14181)

Signed-off-by: Shlomi Noach <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Throttler Type: Bug Type: Enhancement Logical improvement (somewhere between a bug and feature) Type: Internal Cleanup
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tablet throttler: post-v18 changes Table throttler: race condition
3 participants