Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add TestMVCCHistories metamorphic param for TBIs #86256

Merged

Conversation

erikgrinaker
Copy link
Contributor

This patch adds the metamorphic test parameter
mvcc-histories-separate-engine-blocks, which will configure Pebble
with very small blocks and trigger a flush after every command. This
may uncover bugs in TBI optimizations.

Release justification: non-production code changes

Release note: None

@erikgrinaker erikgrinaker requested review from jbowens and a team August 16, 2022 20:13
@erikgrinaker erikgrinaker requested a review from a team as a code owner August 16, 2022 20:13
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker
Copy link
Contributor Author

@jbowens In addition to the issue in cockroachdb/pebble#1895, this also occasionally fails with a goroutine leak, after about 6 seconds. Am I holding this wrong somehow?

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jbowens)

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Filed cockroachdb/pebble#1898 for the goroutine exit race at Close.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker)

@erikgrinaker
Copy link
Contributor Author

@jbowens Any chance we can get that race fixed? I'd really like to merge this (and get some similar metamorphic params in) for better TBI test coverage.

@erikgrinaker
Copy link
Contributor Author

@jbowens Afraid I'm still seeing goroutine leaks after the Pebble bump, even when I give it 30 seconds to clean up. This only happens when I call Flush() after every write.

    mvcc_history_test.go:644: Leaked goroutine: goroutine 5517 [chan receive]:
        github.com/cockroachdb/pebble.(*tableCacheShard).releaseLoop.func1({0x1e87760, 0xc000640120})
        	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/table_cache.go:324 +0x9f
        runtime/pprof.Do({0x1e876f0?, 0xc0001b2000?}, {{0xc00004eca0?, 0x5e4a60?, 0xe4c9c0?}}, 0xc0004d47a8)
        	GOROOT/src/runtime/pprof/runtime.go:40 +0xa3
        github.com/cockroachdb/pebble.(*tableCacheShard).releaseLoop(0xc0004d4798?)
        	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/table_cache.go:322 +0x58
        created by github.com/cockroachdb/pebble.(*tableCacheShard).init
        	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/table_cache.go:314 +0xe5
        Leaked goroutine: goroutine 5521 [chan receive]:
        github.com/cockroachdb/pebble.(*tableCacheShard).releaseLoop.func1({0x1e87760, 0xc0006402a0})
        	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/table_cache.go:324 +0x9f
        runtime/pprof.Do({0x1e876f0?, 0xc0001b2000?}, {{0xc00004eca0?, 0x0?, 0xe4c9c0?}}, 0xc000652fa8)
        	GOROOT/src/runtime/pprof/runtime.go:40 +0xa3
        github.com/cockroachdb/pebble.(*tableCacheShard).releaseLoop(0xc000652f98?)
        	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/table_cache.go:322 +0x58
        created by github.com/cockroachdb/pebble.(*tableCacheShard).init
        	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/table_cache.go:314 +0xe5
        Leaked goroutine: goroutine 5537 [chan receive]:
        github.com/cockroachdb/pebble.(*tableCacheShard).releaseLoop.func1({0x1e87760, 0xc00036c540})
        	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/table_cache.go:324 +0x9f
        runtime/pprof.Do({0x1e876f0?, 0xc0001b2000?}, {{0xc00004eca0?, 0x1e876f0?, 0xe4c9c0?}}, 0xc00049efa8)
        	GOROOT/src/runtime/pprof/runtime.go:40 +0xa3
        github.com/cockroachdb/pebble.(*tableCacheShard).releaseLoop(0xc00049ef98?)
        	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/table_cache.go:322 +0x58
        created by github.com/cockroachdb/pebble.(*tableCacheShard).init
        	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/table_cache.go:314 +0xe5

jbowens added a commit to jbowens/cockroach that referenced this pull request Aug 31, 2022
Fix a couple instances of leaked iterators in unit tests in the storage
package.

This should also resolve the goroutine leak observed in cockroachdb#86256.

Informs cockroachdb#71481.

Release justification: Non-production code changes
Release note: None
craig bot pushed a commit that referenced this pull request Aug 31, 2022
86363: bazel: make `go test` process timeout before bazel kills it r=rickystewart a=healthy-pod

Previously, bazel will kill the test process if it goes
beyond the timeout duration set for its size. This prevented
us from knowing which tests timed out and also prevented us
from getting their stack traces.

This patch causes the `go test` process to timeout before
bazel kills it to allow us to know which test timed out and
get its stack trace.

Closes #78185

Release justification: Non-production code changes
Release note: None

87067: descs,catkv: rewrite Collection backend r=postamar a=postamar

This rewrite was motivated by the upcoming need to validate descriptors
up to different levels depending on why they were being read. Doing so
requires tracking the validation level in a more granular fashion within
the Collection's caches. Trying to make this work within the existing
scheme where the uncommitted and stored descriptors are all in the same
layer turned out to be painful and hard to reason about.

This commit does several things:
1. The uncommitted descriptors are spun off into their own distinct layer
again.
2. The stored descriptors now mirror what's in storage and so that whole
layer could effectively be moved to catkv.
3. Execution flow through the layers in the Collection is now
straightforwardly implemented and easy to reason about. Validation and
hydration now take place in unique, well-defined steps at the very end
of a lookup, be it by name or by ID.
3. Instances of this new stored descriptors layer are now also used by
the lease manager, the Direct() interface provided by the Collection,
etc.
4. As a result the logic in catkv could be significantly cleaned up.
In particular, lookups for descriptor validation are significantly
less convoluted now.
5. The SystemNamespaceCache object was also moved to catkv and now
supports caching descriptors, and so was renamed to
SystemCatalogCache.

Despite significant rewriting of the inner workings of the descriptors
collection, this commit should not functionally change anything.

Informs #85263.

Release justification: low-risk, high-benefit change.
Release note: None


87103: storage: fix mvcc stats on gc of range tombstone over del r=erikgrinaker a=aliher1911

Previously if range tombstone was placed over delete, GC would
not correctly update GC bytes age incorrectly using range tombstone ts
as age of underlying tombstone.

Release justification: Bugfix
Release note: None

Fixing: #87042

87212: ci: set appropriate test tmpdir for compose test r=srosenberg a=rickystewart

Otherwise the tmpdir defaults to `/artifacts` which is non-existent
outside of a Docker container.

Release justification: Non-production code changes
Release note: None

87229: storage: fix leaked iterators in unit tests r=erikgrinaker a=jbowens

Fix a couple instances of leaked iterators in unit tests in the storage
package.

This should also resolve the goroutine leak observed in #86256.

Informs #71481.

Release justification: Non-production code changes
Release note: None

Co-authored-by: healthy-pod <[email protected]>
Co-authored-by: Marius Posta <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Oleg Afanasyev <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
This patch adds the metamorphic test parameter
`mvcc-histories-separate-engine-blocks`, which will configure Pebble
with very small blocks and trigger a flush after every command. This
may uncover bugs in TBI optimizations.

Release justification: non-production code changes

Release note: None
@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Sep 1, 2022

Thanks for getting to the bottom of the goroutine leaks here @jbowens!

CI failed with some weird npm-run-all: command not found error, assuming it's dependency flake or some such.

bors r=tbg,jbowens

@craig
Copy link
Contributor

craig bot commented Sep 1, 2022

Build succeeded:

@craig craig bot merged commit 3a84b55 into cockroachdb:master Sep 1, 2022
@erikgrinaker erikgrinaker deleted the mvcc-histories-metamorphic-tbi branch September 1, 2022 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants