Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: surface (*pebble.DB).Close errors, audit iterator leaks #71481

Closed
jbowens opened this issue Oct 12, 2021 · 0 comments · Fixed by #87232
Closed

storage: surface (*pebble.DB).Close errors, audit iterator leaks #71481

jbowens opened this issue Oct 12, 2021 · 0 comments · Fixed by #87232
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-storage Storage Team

Comments

@jbowens
Copy link
Collaborator

jbowens commented Oct 12, 2021

See #71303.

There are more places where test cases leak iterators. During test shutdown when the engine is closed (*pebble.DB).Close returns an error indicating that an iterator was leaked. The *storage.Pebble type discards this error. Any unit tests that leak iterators succeed as a result.

We should fix these leaked iterators. I'm mostly worried about potential usage of these iterators after or concurrent with the closing of the engine.

We've seen a few issues that appear as memory corruption within the block cache #70154 and https://cockroachlabs.slack.com/archives/C026MSSL926/p1633479580006300.

Jira issue: CRDB-10598

@jbowens jbowens added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 12, 2021
@jbowens jbowens changed the title storage: surface ( storage: surface (*pebble.DB).Close errors, audit iterator leaks Oct 12, 2021
@jbowens jbowens added A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Oct 12, 2021
jbowens added a commit to jbowens/cockroach that referenced this issue Aug 31, 2022
Fix a couple instances of leaked iterators in unit tests in the storage
package.

This should also resolve the goroutine leak observed in cockroachdb#86256.

Informs cockroachdb#71481.

Release justification: Non-production code changes
Release note: None
craig bot pushed a commit that referenced this issue Aug 31, 2022
86363: bazel: make `go test` process timeout before bazel kills it r=rickystewart a=healthy-pod

Previously, bazel will kill the test process if it goes
beyond the timeout duration set for its size. This prevented
us from knowing which tests timed out and also prevented us
from getting their stack traces.

This patch causes the `go test` process to timeout before
bazel kills it to allow us to know which test timed out and
get its stack trace.

Closes #78185

Release justification: Non-production code changes
Release note: None

87067: descs,catkv: rewrite Collection backend r=postamar a=postamar

This rewrite was motivated by the upcoming need to validate descriptors
up to different levels depending on why they were being read. Doing so
requires tracking the validation level in a more granular fashion within
the Collection's caches. Trying to make this work within the existing
scheme where the uncommitted and stored descriptors are all in the same
layer turned out to be painful and hard to reason about.

This commit does several things:
1. The uncommitted descriptors are spun off into their own distinct layer
again.
2. The stored descriptors now mirror what's in storage and so that whole
layer could effectively be moved to catkv.
3. Execution flow through the layers in the Collection is now
straightforwardly implemented and easy to reason about. Validation and
hydration now take place in unique, well-defined steps at the very end
of a lookup, be it by name or by ID.
3. Instances of this new stored descriptors layer are now also used by
the lease manager, the Direct() interface provided by the Collection,
etc.
4. As a result the logic in catkv could be significantly cleaned up.
In particular, lookups for descriptor validation are significantly
less convoluted now.
5. The SystemNamespaceCache object was also moved to catkv and now
supports caching descriptors, and so was renamed to
SystemCatalogCache.

Despite significant rewriting of the inner workings of the descriptors
collection, this commit should not functionally change anything.

Informs #85263.

Release justification: low-risk, high-benefit change.
Release note: None


87103: storage: fix mvcc stats on gc of range tombstone over del r=erikgrinaker a=aliher1911

Previously if range tombstone was placed over delete, GC would
not correctly update GC bytes age incorrectly using range tombstone ts
as age of underlying tombstone.

Release justification: Bugfix
Release note: None

Fixing: #87042

87212: ci: set appropriate test tmpdir for compose test r=srosenberg a=rickystewart

Otherwise the tmpdir defaults to `/artifacts` which is non-existent
outside of a Docker container.

Release justification: Non-production code changes
Release note: None

87229: storage: fix leaked iterators in unit tests r=erikgrinaker a=jbowens

Fix a couple instances of leaked iterators in unit tests in the storage
package.

This should also resolve the goroutine leak observed in #86256.

Informs #71481.

Release justification: Non-production code changes
Release note: None

Co-authored-by: healthy-pod <[email protected]>
Co-authored-by: Marius Posta <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Oleg Afanasyev <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
jbowens added a commit to jbowens/cockroach that referenced this issue Sep 1, 2022
Panic if an error is encountered while closing the Engine. This ensures
unit tests and the like observe errors, especially related to leaked
iterators.

Close cockroachdb#71481.

Release justification: low-risk bug fixes and non-production code changes
Release note: None
jbowens added a commit to jbowens/cockroach that referenced this issue Sep 7, 2022
When the crdb_test build flag is provided, fatal the process if an engine Close
returns an error. This ensures unit tests and the like observe errors,
especially related to leaked iterators.

Close cockroachdb#71481.

Release justification: low-risk bug fixes and non-production code changes
Release note: None
jbowens added a commit to jbowens/cockroach that referenced this issue Sep 8, 2022
Panic if an error is encountered while closing the Engine. This ensures
unit tests and the like observe errors, especially related to leaked
iterators.

Close cockroachdb#71481.

Release justification: low-risk bug fixes and non-production code changes
Release note: None
jbowens added a commit to jbowens/cockroach that referenced this issue Sep 12, 2022
When the crdb_test build flag is provided, fatal the process if an engine Close
returns an error. This ensures unit tests and the like observe errors,
especially related to leaked iterators.

Close cockroachdb#71481.

Release justification: low-risk bug fixes and non-production code changes
Release note: None
craig bot pushed a commit that referenced this issue Sep 14, 2022
87232: storage: check error on Engine close  r=erikgrinaker a=jbowens

Panic if an error is encountered while closing the Engine. This ensures
unit tests and the like observe errors, especially related to leaked
iterators.

Close #71481.
Close #87507.

Release justification: low-risk bug fixes and non-production code changes
Release note: None

87928: build: fix Pebble metamorphic race nightly r=nicktrav a=jbowens

Fix the Pebble metamorphic nightly with race detector enabled. A change to command-line argument parsing has been preventing it from running.

Release note: None

Co-authored-by: Jackson Owens <[email protected]>
@craig craig bot closed this as completed in e4a8a43 Sep 14, 2022
blathers-crl bot pushed a commit that referenced this issue Sep 14, 2022
When the crdb_test build flag is provided, fatal the process if an engine Close
returns an error. This ensures unit tests and the like observe errors,
especially related to leaked iterators.

Close #71481.

Release justification: low-risk bug fixes and non-production code changes
Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-storage Storage Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant