Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

engine/metamorphic: Add engine restarts, compare across runs #44630

Merged
merged 1 commit into from
Feb 13, 2020

Conversation

itsbilal
Copy link
Member

@itsbilal itsbilal commented Feb 2, 2020

This PR builds on top of #44458 (all commits before the last one
are from that PR). It adds two things: one, it ensures that whenever
successive engine types are tested, it does a comparison for matching
outputs. This will ensure CI will fail if any change down the line
causes an observed difference in behaviour between rocksdb and pebble.

Furthermore, it adds a new kind of operation: restart. Restarts are
special in that they're added after the specified n number of runs
have happened; so a test run with 3 specified engine types
(so 2 restarts), and n = 10000 will have 3 * 10000 = 30000 operations.
A restart closes all open objects, then closes the engine, and starts
up the next engine in the specified engine sequence. This, combined
with the aforementioned checking functionality across different
engine runs, lets us test for bidirectional compatibility across
different engines.

Fixes #43762 .

Release note: None.

@itsbilal itsbilal self-assigned this Feb 2, 2020
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@itsbilal
Copy link
Member Author

itsbilal commented Feb 2, 2020

The refactor to-dos about separating generation and execution that apply to #44458 also apply here. I'll do both refactors early in the week; the extra work in this PR is mostly orthogonal to those suggestions.

That's mostly why this PR is a WIP. That, and the code could do with some more cleanup / comments. But it works.

@itsbilal itsbilal force-pushed the mvcc-metamorphic-tests-restarts branch 5 times, most recently from e7c8a49 to 07e243b Compare February 7, 2020 22:27
@itsbilal itsbilal changed the title engine/metamorphic: [WIP] Add engine restarts, compare across runs engine/metamorphic: Add engine restarts, compare across runs Feb 7, 2020
@itsbilal
Copy link
Member Author

itsbilal commented Feb 7, 2020

This is ready for review. I've added many more operations, and now I'm starting to see some differences with restarts enabled (eg. around MVCCFindSplitKey that I need to triage). Increasing operation coverage as well as testing more rocksdb/pebble optinos will yield interesting results I'm sure; right now we rarely have more than one non-L0 SST, which leaves a lot of things untested.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, though I'm eager to see generation and execution detangled.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @petermattis, and @sumeerbhola)


pkg/storage/engine/metamorphic/generator.go, line 36 at r1 (raw file):

		Attrs:           roachpb.Attributes{},
		Dir:             path,
		MustExist:       false,

Nit: there is no need to explicitly set fields to the zero value, though I sometimes do so for clarity. Are you doing so for clarity here? Seems like this initialization and the ones below could be removed. Ditto for Attrs.


pkg/storage/engine/metamorphic/generator.go, line 175 at r1 (raw file):

	deck := newDeck(m.rng, m.weights...)

	for numRestart := 0; numRestart <= m.numRestarts; numRestart++ {

This is a bit odd. Instead of running n operations, you'll be running n*(m.numRestarts+1) operations. And the restarts are not inserted randomly but performed after exactly n operations have been performed. I think this is another artifact of having generation tightly coupled to execution.


pkg/storage/engine/metamorphic/meta_test.go, line 163 at r1 (raw file):

	for _, seed := range seeds {
		t.Run(fmt.Sprintf("seed=%d", seed), func(t *testing.T) {
			// 2 engine sequences with 1 restart each.

Rather than exhaustively listing out the possibilities, I was thinking we'd use a bit of randomness here. Give the test the set of engines it can choose from and select a random engine from that set on each restart. Then you could test with RocksDB-only, Pebble-only, and RocksDB+Pebble. This would also work better in conjunction with generating a random number of restarts, rather than using the number of engines.


pkg/storage/engine/metamorphic/operations.go, line 775 at r1 (raw file):

			m.restart()
			// TODO(itsbilal): Write the old and new engine names here. Would be
			// helpful if the checker could ignore this.

The Pebble metamorphic test ignores "comments" in the output. The presence of comments has been very useful in diagnosing problems.


pkg/storage/engine/metamorphic/operations.go, line 781 at r1 (raw file):

		// The restart operation is special; it's added by the metaTestRunner
		// manually. To prevent it from being inadvertently added by the deck, set
		// weight to 0.

Is there a reason not to be adding it randomly? That's what we do in the Pebble metamorphic test.

@itsbilal itsbilal force-pushed the mvcc-metamorphic-tests-restarts branch from 07e243b to cf4b65d Compare February 10, 2020 22:11
Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I've implemented random generation of restarts now. Also added comments, which the restart operation uses to print out before/after engine types.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @petermattis and @sumeerbhola)


pkg/storage/engine/metamorphic/generator.go, line 36 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: there is no need to explicitly set fields to the zero value, though I sometimes do so for clarity. Are you doing so for clarity here? Seems like this initialization and the ones below could be removed. Ditto for Attrs.

Yeah, there's no point in doing this. Done.


pkg/storage/engine/metamorphic/generator.go, line 175 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This is a bit odd. Instead of running n operations, you'll be running n*(m.numRestarts+1) operations. And the restarts are not inserted randomly but performed after exactly n operations have been performed. I think this is another artifact of having generation tightly coupled to execution.

Done. Made the restart operation yet another randomly-scheduled operation.


pkg/storage/engine/metamorphic/meta_test.go, line 163 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Rather than exhaustively listing out the possibilities, I was thinking we'd use a bit of randomness here. Give the test the set of engines it can choose from and select a random engine from that set on each restart. Then you could test with RocksDB-only, Pebble-only, and RocksDB+Pebble. This would also work better in conjunction with generating a random number of restarts, rather than using the number of engines.

Done.


pkg/storage/engine/metamorphic/operations.go, line 775 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The Pebble metamorphic test ignores "comments" in the output. The presence of comments has been very useful in diagnosing problems.

Done. TODO removed as well.


pkg/storage/engine/metamorphic/operations.go, line 781 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Is there a reason not to be adding it randomly? That's what we do in the Pebble metamorphic test.

Done. No good reason for it, other than it made the restarts more controllable at the test runner level. But in practice it looks like random is the way to go; there are too many combinations, and manually iterating them then running each iteration is slower in practice.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo a few minor comments.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)


pkg/storage/engine/metamorphic/generator.go, line 181 at r2 (raw file):

	m.closeAll()
	oldEngineName := m.engineImpls[m.curEngine].name
	m.curEngine++

I wonder if this should randomly chose the engine to restart into, rather than rotating amongst the engines.


pkg/storage/engine/metamorphic/meta_test.go, line 35 at r2 (raw file):

)

func runMetaTestForEngines(ctx context.Context, t *testing.T, seed int64, checkFile io.Reader, outputFile io.Writer, restarts bool, engineImpls []engineImpl) {

The number of arguments is getting difficult to read and error prone. You might want to define a struct to hold them.


pkg/storage/engine/metamorphic/operations.go, line 773 at r2 (raw file):

		name: "restart",
		run: func(ctx context.Context, m *metaTestRunner, args ...operand) string {
			if !m.restarts {

This seems oddly specific. When generation is untangled from execution, a better way to accomplish this is to be be able to define a different set of weights for a test which has a weight of 0 for restarts.

@itsbilal itsbilal force-pushed the mvcc-metamorphic-tests-restarts branch from cf4b65d to d8e6d53 Compare February 11, 2020 23:52
Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Also looking forward to cleaning up and separating generation/execution, once I'm through the issues this test is finding.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @petermattis and @sumeerbhola)


pkg/storage/engine/metamorphic/generator.go, line 181 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I wonder if this should randomly chose the engine to restart into, rather than rotating amongst the engines.

That would probably give interesting results. For now the guaranteed toggle is finding enough issues as-is. I can add a random restart mode down the line.


pkg/storage/engine/metamorphic/meta_test.go, line 35 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The number of arguments is getting difficult to read and error prone. You might want to define a struct to hold them.

Done.


pkg/storage/engine/metamorphic/operations.go, line 773 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This seems oddly specific. When generation is untangled from execution, a better way to accomplish this is to be be able to define a different set of weights for a test which has a weight of 0 for restarts.

Noted. It's possible to dynamically set the weight to 0 in metaTestRunner.init already, but the one somewhat nice side effect of no-op-ing this operation at runtime (like we do right now) is you can pass an output file from a run with restarts to a run without restarts to see if it finds a difference.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal and @sumeerbhola)


pkg/storage/engine/metamorphic/generator.go, line 181 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

That would probably give interesting results. For now the guaranteed toggle is finding enough issues as-is. I can add a random restart mode down the line.

Ack. This is fine for now, though you could add a TODO about adding randomization in the future.


pkg/storage/engine/metamorphic/operations.go, line 773 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Noted. It's possible to dynamically set the weight to 0 in metaTestRunner.init already, but the one somewhat nice side effect of no-op-ing this operation at runtime (like we do right now) is you can pass an output file from a run with restarts to a run without restarts to see if it finds a difference.

Good point about running the same set of operations with and without restarts. I think you should encapsulate these test-run specific options in some sort of metaTestOpts. The Pebble metamorphic test provides something similar in its testOpts structure, which is persisted to disk. This is a suggestion for the future. The current code is fine for this PR.

@itsbilal itsbilal force-pushed the mvcc-metamorphic-tests-restarts branch from d8e6d53 to 229db24 Compare February 12, 2020 05:20
Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal, @petermattis, and @sumeerbhola)


pkg/storage/engine/metamorphic/generator.go, line 181 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Ack. This is fine for now, though you could add a TODO about adding randomization in the future.

Done.

@itsbilal itsbilal force-pushed the mvcc-metamorphic-tests-restarts branch from 229db24 to 68c0a7d Compare February 12, 2020 17:22
@itsbilal
Copy link
Member Author

Reduced operation count to 1000 by default; otherwise, make stress was timing out in CI. The count can now be configured with --operations <count>.

This PR builds on top of cockroachdb#44458 (all commits before the last one
are from that PR). It adds two things: one, it ensures that whenever
successive engine types are tested, it does a comparison for matching
outputs. This will ensure CI will fail if any change down the line
causes an observed difference in behaviour between rocksdb and pebble.

It also adds more MVCC operations that would be useful to test, such
as MVCCFindSplitKey, MVCCDeleteRange, MVCCClearTimeRange,
MVCCConditionalPut, etc.

Furthermore, it adds a new kind of operation: restart. Restarts are
special in that they're added after the specified n number of runs
have happened; so a test run with 3 specified engine types
(so 2 restarts), and n = 10000 will have 3 * 10000 = 30000 operations.
A restart closes all open objects, then closes the engine, and starts
up the next engine in the specified engine sequence. This, combined
with the aforementioned checking functionality across different
engine runs, lets us test for bidirectional compatibility across
different engines.

Fixes cockroachdb#43762 .

Release note: None.
@itsbilal itsbilal force-pushed the mvcc-metamorphic-tests-restarts branch from 68c0a7d to 4c72ea5 Compare February 12, 2020 18:02
@itsbilal
Copy link
Member Author

TFTR!

bors r+

craig bot pushed a commit that referenced this pull request Feb 12, 2020
44630:  engine/metamorphic: Add engine restarts, compare across runs r=itsbilal a=itsbilal

This PR builds on top of #44458 (all commits before the last one
are from that PR). It adds two things: one, it ensures that whenever
successive engine types are tested, it does a comparison for matching
outputs. This will ensure CI will fail if any change down the line
causes an observed difference in behaviour between rocksdb and pebble.

Furthermore, it adds a new kind of operation: restart. Restarts are
special in that they're added after the specified n number of runs
have happened; so a test run with 3 specified engine types
(so 2 restarts), and n = 10000 will have 3 * 10000 = 30000 operations.
A restart closes all open objects, then closes the engine, and starts
up the next engine in the specified engine sequence. This, combined
with the aforementioned checking functionality across different
engine runs, lets us test for bidirectional compatibility across
different engines.

Fixes #43762 .

Release note: None.

Co-authored-by: Bilal Akhtar <[email protected]>
@craig
Copy link
Contributor

craig bot commented Feb 13, 2020

Build succeeded

@craig craig bot merged commit 4c72ea5 into cockroachdb:master Feb 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage/engine: add MVCC metamorphic test
3 participants