-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: surface (*pebble.DB).Close errors, audit iterator leaks #71481
Comments
jbowens
added
the
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
label
Oct 12, 2021
jbowens
changed the title
storage: surface (
storage: surface (*pebble.DB).Close errors, audit iterator leaks
Oct 12, 2021
jbowens
added
A-storage
Relating to our storage engine (Pebble) on-disk storage.
T-storage
Storage Team
labels
Oct 12, 2021
jbowens
added a commit
to jbowens/cockroach
that referenced
this issue
Aug 31, 2022
Fix a couple instances of leaked iterators in unit tests in the storage package. This should also resolve the goroutine leak observed in cockroachdb#86256. Informs cockroachdb#71481. Release justification: Non-production code changes Release note: None
craig bot
pushed a commit
that referenced
this issue
Aug 31, 2022
86363: bazel: make `go test` process timeout before bazel kills it r=rickystewart a=healthy-pod Previously, bazel will kill the test process if it goes beyond the timeout duration set for its size. This prevented us from knowing which tests timed out and also prevented us from getting their stack traces. This patch causes the `go test` process to timeout before bazel kills it to allow us to know which test timed out and get its stack trace. Closes #78185 Release justification: Non-production code changes Release note: None 87067: descs,catkv: rewrite Collection backend r=postamar a=postamar This rewrite was motivated by the upcoming need to validate descriptors up to different levels depending on why they were being read. Doing so requires tracking the validation level in a more granular fashion within the Collection's caches. Trying to make this work within the existing scheme where the uncommitted and stored descriptors are all in the same layer turned out to be painful and hard to reason about. This commit does several things: 1. The uncommitted descriptors are spun off into their own distinct layer again. 2. The stored descriptors now mirror what's in storage and so that whole layer could effectively be moved to catkv. 3. Execution flow through the layers in the Collection is now straightforwardly implemented and easy to reason about. Validation and hydration now take place in unique, well-defined steps at the very end of a lookup, be it by name or by ID. 3. Instances of this new stored descriptors layer are now also used by the lease manager, the Direct() interface provided by the Collection, etc. 4. As a result the logic in catkv could be significantly cleaned up. In particular, lookups for descriptor validation are significantly less convoluted now. 5. The SystemNamespaceCache object was also moved to catkv and now supports caching descriptors, and so was renamed to SystemCatalogCache. Despite significant rewriting of the inner workings of the descriptors collection, this commit should not functionally change anything. Informs #85263. Release justification: low-risk, high-benefit change. Release note: None 87103: storage: fix mvcc stats on gc of range tombstone over del r=erikgrinaker a=aliher1911 Previously if range tombstone was placed over delete, GC would not correctly update GC bytes age incorrectly using range tombstone ts as age of underlying tombstone. Release justification: Bugfix Release note: None Fixing: #87042 87212: ci: set appropriate test tmpdir for compose test r=srosenberg a=rickystewart Otherwise the tmpdir defaults to `/artifacts` which is non-existent outside of a Docker container. Release justification: Non-production code changes Release note: None 87229: storage: fix leaked iterators in unit tests r=erikgrinaker a=jbowens Fix a couple instances of leaked iterators in unit tests in the storage package. This should also resolve the goroutine leak observed in #86256. Informs #71481. Release justification: Non-production code changes Release note: None Co-authored-by: healthy-pod <[email protected]> Co-authored-by: Marius Posta <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Oleg Afanasyev <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Jackson Owens <[email protected]>
jbowens
added a commit
to jbowens/cockroach
that referenced
this issue
Sep 1, 2022
Panic if an error is encountered while closing the Engine. This ensures unit tests and the like observe errors, especially related to leaked iterators. Close cockroachdb#71481. Release justification: low-risk bug fixes and non-production code changes Release note: None
jbowens
added a commit
to jbowens/cockroach
that referenced
this issue
Sep 7, 2022
When the crdb_test build flag is provided, fatal the process if an engine Close returns an error. This ensures unit tests and the like observe errors, especially related to leaked iterators. Close cockroachdb#71481. Release justification: low-risk bug fixes and non-production code changes Release note: None
jbowens
added a commit
to jbowens/cockroach
that referenced
this issue
Sep 8, 2022
Panic if an error is encountered while closing the Engine. This ensures unit tests and the like observe errors, especially related to leaked iterators. Close cockroachdb#71481. Release justification: low-risk bug fixes and non-production code changes Release note: None
jbowens
added a commit
to jbowens/cockroach
that referenced
this issue
Sep 12, 2022
When the crdb_test build flag is provided, fatal the process if an engine Close returns an error. This ensures unit tests and the like observe errors, especially related to leaked iterators. Close cockroachdb#71481. Release justification: low-risk bug fixes and non-production code changes Release note: None
craig bot
pushed a commit
that referenced
this issue
Sep 14, 2022
87232: storage: check error on Engine close r=erikgrinaker a=jbowens Panic if an error is encountered while closing the Engine. This ensures unit tests and the like observe errors, especially related to leaked iterators. Close #71481. Close #87507. Release justification: low-risk bug fixes and non-production code changes Release note: None 87928: build: fix Pebble metamorphic race nightly r=nicktrav a=jbowens Fix the Pebble metamorphic nightly with race detector enabled. A change to command-line argument parsing has been preventing it from running. Release note: None Co-authored-by: Jackson Owens <[email protected]>
blathers-crl bot
pushed a commit
that referenced
this issue
Sep 14, 2022
When the crdb_test build flag is provided, fatal the process if an engine Close returns an error. This ensures unit tests and the like observe errors, especially related to leaked iterators. Close #71481. Release justification: low-risk bug fixes and non-production code changes Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
See #71303.
There are more places where test cases leak iterators. During test shutdown when the engine is closed
(*pebble.DB).Close
returns an error indicating that an iterator was leaked. The*storage.Pebble
type discards this error. Any unit tests that leak iterators succeed as a result.We should fix these leaked iterators. I'm mostly worried about potential usage of these iterators after or concurrent with the closing of the engine.
We've seen a few issues that appear as memory corruption within the block cache #70154 and https://cockroachlabs.slack.com/archives/C026MSSL926/p1633479580006300.
Jira issue: CRDB-10598
The text was updated successfully, but these errors were encountered: