Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/engine: MVCC Metamorphic test suite, first phase #44458

Merged
merged 4 commits into from
Feb 6, 2020

Conversation

itsbilal
Copy link
Member

@itsbilal itsbilal commented Jan 28, 2020

This PR adds a new test-only sub-package to engine, metamorphic,
which has one test, TestMeta, that generates and runs random MVCC
operations on rocksdb and pebble instances with default settings.

Future additions to this test suite could include:

  • A "check" mode that takes an output file as input, parses it, runs the operations in that sequence, and compares output strings.
  • Diffing test output between rocksdb and pebble and failing if there's a difference
  • Adding support for more operations
  • Adding a "restart" operation that closes the engine and restarts a different kind of engine in the store directory, then confirming operations after that point generate the same output.

First-but-biggest part of #43762 .

Release note: None

@itsbilal itsbilal self-assigned this Jan 28, 2020
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@itsbilal
Copy link
Member Author

I'll designate this as WIP for now until I finish the ability to parse/check an output file later today. Until that happens, some of the parsing code is unused and buggier than I'd like.

@itsbilal itsbilal changed the title storage/engine: MVCC Metamorphic test suite, first phase storage/engine: [WIP] MVCC Metamorphic test suite, first phase Jan 29, 2020
@itsbilal
Copy link
Member Author

Fixed bugs in the parsing paths, TestMeta when re-run with its own output files now passes. Ready for review again.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass comments that jumped out at me. I'll continue to review this later today.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @petermattis, and @sumeerbhola)


pkg/storage/engine/metamorphic/generator.go, line 35 at r1 (raw file):

	engine      engine.Engine
	tsGenerator tsGenerator
	managers    map[operandType]operandManager

I suspect there is a level of abstraction here that isn't serving much purpose. Given that there are a fixed number of "*managers", rather than a map you could embed the concrete types here. There are a few places where you're using a variable as a key to managers, but my suspicion is that is due to the operand abstraction which I'm also questioning.


pkg/storage/engine/metamorphic/operands.go, line 30 at r1 (raw file):

const (
	OPERAND_TRANSACTION operandType = iota

Nit: Go style is camelcase for constants. For example, operandTransaction.


pkg/storage/engine/metamorphic/operations.go, line 32 at r1 (raw file):

// dependantOps commands should be stateless, with all state stored in the
// passed-in test runner or its operand managers.
type mvccOp struct {

You took a different approach here than what is done in the Pebble metamorphic test. Rather than having an op interface with concrete implementations that refer to concrete arguments, you've abstracted the operands. Do you mind motivating this different approach? On the surface, it seems to push more work to the op execution phase. That isn't necessarily problematic. It also mixes in generation with execution (e.g. mvccOp.weight which shouldn't be needed at execution time). I'm also not quite clear on what to make of dependentOps. In the Pebble metamorphic test, op dependencies are implicit in their ordering.

Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @petermattis, and @sumeerbhola)


pkg/storage/engine/metamorphic/operations.go, line 32 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

You took a different approach here than what is done in the Pebble metamorphic test. Rather than having an op interface with concrete implementations that refer to concrete arguments, you've abstracted the operands. Do you mind motivating this different approach? On the surface, it seems to push more work to the op execution phase. That isn't necessarily problematic. It also mixes in generation with execution (e.g. mvccOp.weight which shouldn't be needed at execution time). I'm also not quite clear on what to make of dependentOps. In the Pebble metamorphic test, op dependencies are implicit in their ordering.

The operand abstraction came from an ideal goal of being able to cleanly define what operands depended on others - so if you need a batch and have zero batches, you'll just create one by inserting a batch's opener, and the same for transactions, etc. I figured this would be an easier way to generate correct sequences of operations, to just insert openers as necessary, instead of restricting your generator to the set of legal operations (a set that can be of high complexity to completely generate; the interaction between batches and transactions being the most tricky one that comes to mind). The hope was also that all state information about active objects would live in their managers only.

In practice it all ended up being more intertwined and less modular than I was hoping, but the general idea still works.

All non-opener dependent operations are defined in dependentOps. I should probably describe this in a comment too, but basically, every time you "randomly generate" an operation, you go through its operands list, see if any need to be opened, then recursively run their openers (if necessary), then run the dependentOps method, and recursively run those operations (which could also require running more openers and more dependentOps), then you run the operation itself.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)


pkg/storage/engine/metamorphic/operands.go, line 58 at r3 (raw file):

}

type operand interface{}

Usage of the empty interface is a big signal that something is off with the abstractions. The required use of type assertions is a related indicator. My instinct is that there should be abstraction at the op level, with concrete implementations of the various MVCC operations that know the concrete types of their parameters/operands. For example:

type mvccScanOp struct {
  reader Reader
  start roachpb.Key
  end roachpb.Key
  max int64
  ...
}

One nuance here is that you don't want to store an actual engine.Reader in mvccScanOp. That's why the Pebble metamorphic tests use IDs for all of the objects. The level of indirection allows an op to refer to an object that hasn't been created yet. This is similar to the way you have ops retrieving objects from managers.


pkg/storage/engine/metamorphic/operations.go, line 32 at r1 (raw file):

The operand abstraction came from an ideal goal of being able to cleanly define what operands depended on others - so if you need a batch and have zero batches, you'll just create one by inserting a batch's opener, and the same for transactions, etc.

I don't think it implies the operand abstraction. I think you primarily get this functionality via the mvccOp.operands specification.

It isn't clear to me if this goal is necessary. The Pebble metamorphic test generator simply skips generating an op if it is applied to an object for which there are no live objects. For example, if you try to generate an iter.Close op and there are no live iterators. Immediately opening and then closing an iterator doesn't seem to be terribly useful.

The hope was also that all state information about active objects would live in their managers only.

I'm good with placing the state information about active objects in the managers. My feeling is that the managers should be a op-generation time component. Currently they are also an op-execution component as well.

In practice it all ended up being more intertwined and less modular than I was hoping, but the general idea still works.

I think it is worthwhile to disentangle the code and state needed for generation from the state needed for executing ops. The current mvccOps structure is good on the generation side, but the way execution is mixed in there is a bit messy. And I think this will get even messier when you add parsing and what not.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This testing infrastructure is complex enough, that it is beneficial for it to have its own tests so you can debug the component pieces in isolation. It would be nice to have a test that outputs the ops that are generated (without running them) for a particular seed and count. See also TestGenerator in the Pebble metamorphic test which verifies that all of the live objects are properly closed. Another bit that could use a test is the parsing. Right now that is a bit difficult because execution is interwoven with parsing. Being able to test parsing in isolation is a good reason for disentangling that.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)


pkg/storage/engine/metamorphic/generator.go, line 248 at r3 (raw file):

func (t *tsGenerator) randomPastTimestamp(rng *rand.Rand) hlc.Timestamp {
	var result hlc.Timestamp
	result.WallTime = int64(float64(t.lastTS.WallTime+1) * rng.Float64())

I suspect it would be better for the historic times to be skewed towards the current time. Not sure precisely how I would accomplish that. Perhaps with a zipfian distribution.


pkg/storage/engine/metamorphic/meta_test.go, line 119 at r3 (raw file):

	defer leaktest.AfterTest(t)
	ctx := context.Background()
	seeds := []int64{123}

By default, I think you should generate a few random seeds. Perhaps make this a command line flag.


pkg/storage/engine/metamorphic/meta_test.go, line 136 at r3 (raw file):

	}
	for _, seed := range seeds {
		t.Run(fmt.Sprintf("seed=%d", seed), func(t *testing.T) {

The seed won't generate the same series of operations if the use of the random number generator changes. Is that a problem?


pkg/storage/engine/metamorphic/operands.go, line 112 at r3 (raw file):

	}

	return k.liveKeys[int(k.rng.Float64()*float64(len(k.liveKeys)))]

Better to do k.liveKeys[k.rng.Intn(len(k.liveKeys))]. Using floats to generate random numbers in a range like this is actually not uniformly random which is sort of surprising. I don't think it really matters, but it is better to use the supported idiom for generating integers in a range.

@itsbilal
Copy link
Member Author

Thanks for the review!

I just added support for more operations, as well as standardized behaviour between rocksdb and pebble enough that you can pass output from one engine into the other with --check and it should always pass.

Functionally I'll cap this PR off at this point. I'll make another parallel PR for the restart point work - which should be straightforward now that everything it depends on is done. Once that is up and running (hopefully with no obvious failures) I'll come back here and address code review comments - so thanks for doing a review quickly.

I'm open to pulling out all execution bits from the mvccOp struct and making it part of a concrete struct for each operation, complete with concrete bound parameters (that might also need to be IDs, for the reason you described.). operands can be ID strings going forward, and just parsed by the relevant manager upon execution time. The dependentOps stuff can be purely a generation-time factor, as well as a new onGenerated handler can handle populating any managers with the new state of the universe post-op generation (eg. removing closed IDs, keeping track of batches with in-flight writes for a transaction, etc).

And speaking about the file checking mode, it turned out to be relatively straightforward, because assuming the rest of the state machine behaves predictably and the output file was well generated, all dependentOps and openers would have already run by the time an operation shows up.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally I'll cap this PR off at this point.

Let me give this another review today and we'll try to get it merged in its current form. The suggestions I'm making can be handled in follow-on PRs. Probably easier for everyone that way.

operands can be ID strings going forward, and just parsed by the relevant manager upon execution time.

I'm suggesting that there is an explicit parsing step for both operators and operands. If an operator has an int operator, it should be an int field in the associated mvccOp struct.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)


pkg/storage/engine/metamorphic/deck.go, line 24 at r4 (raw file):

// ensures that each element is returned with a desired frequency within the
// size of the deck.
type Deck struct {

Only used within this package, so this can be unexported: s/Deck/deck/g.


pkg/storage/engine/metamorphic/generator.go, line 100 at r4 (raw file):

// generateAndRun generates n operations using a TPCC-style deck shuffle with
// weighted probabilities of each operation appearing.
func (m *metaTestRunner) generateAndRun(n uint64) {

Nit: we're unlikely to run more than 2 billion operations. s/uint64/int/g. This also simplifies the loop below slightly.


pkg/storage/engine/metamorphic/generator.go, line 101 at r4 (raw file):

// weighted probabilities of each operation appearing.
func (m *metaTestRunner) generateAndRun(n uint64) {
	m.ops = make([]*mvccOp, n)

You're initializing m.ops to length n, but then seem to be appending to m.ops in runOp. I think the end result is that you'll end up with len(m.ops) == 2 * n. Perhaps you meant to make([]*mvccOp, 0, n)?


pkg/storage/engine/metamorphic/generator.go, line 105 at r4 (raw file):

	for i := uint64(0); i < n; i++ {
		opToAdd := &operations[deck.Int()]

Are you adding the op somewhere? Seems like you're just running it. Perhaps s/opToAdd/op/g.


pkg/storage/engine/metamorphic/meta_test.go, line 47 at r4 (raw file):

// createTestPebbleEngine returns a new in-memory Pebble storage engine.
func createTestPebbleEngine(path string) (engine.Engine, error) {
	return engine.NewEngine(enginepb.EngineTypePebble, 1<<20, base.StorageConfig{

The base.StorageConfig looks identical between this function and createTestRocksDBEngine. Perhaps pull out into a helper: func makeStorageConfig(dir string) base.StorageConfig.


pkg/storage/engine/metamorphic/meta_test.go, line 109 at r4 (raw file):

				testRunner.parseFileAndRun(checkFile)
			} else {
				testRunner.generateAndRun(10000)

Add a TODO about making this configurable. Note that in Pebble, the default is a random number in the range [5000,10000).


pkg/storage/engine/metamorphic/operands.go, line 49 at r4 (raw file):

// writers, etc) should be stored in an operandManager.
type operandManager interface {
	get() operand

Can you provide some commentary around these? get() seems to generate an operand (would generate be a better name?). opener() seems to return the name of an operator.


pkg/storage/engine/metamorphic/operands.go, line 178 at r4 (raw file):

	liveTxns        []*roachpb.Transaction
	txnIdMap        map[string]*roachpb.Transaction
	inFlightBatches map[*roachpb.Transaction][]engine.Batch

s/inFlightBatches/openBatches/g?


pkg/storage/engine/metamorphic/operands.go, line 256 at r4 (raw file):

				i--
				found = true
				// Don't break; this batch could appear multiple times.

Can you elaborate? Why would the same batch appear multiple times in the in-flight batches list.


pkg/storage/engine/metamorphic/operations.go, line 131 at r4 (raw file):

// List of operations, where each operation is defined as one instance of mvccOp.
var operations = []mvccOp{

I think there are some operations missing here. Can you add a TODO? For example, MVCCConditionalPut, MVCCBlindPut, MVCCMerge, MVCCClearTimeRange. Perhaps you've added these in subsequent PRs.

I'm not seeing ingestion, which is another important bit of surface area to trigger. The way ingestion works in the Pebble metamorphic test is that it converts a batch into an sstable and ingests that. Something similar could be done here. Again, a TODO is sufficient for this PR.


pkg/storage/engine/metamorphic/operations.go, line 161 at r4 (raw file):

			val, intent, err := engine.MVCCGet(ctx, reader, key.Key, txn.ReadTimestamp, engine.MVCCGetOptions{
				Inconsistent: false,
				Tombstones:   true,

Add a TODO that all of these parameters should be operands.


pkg/storage/engine/metamorphic/operations.go, line 313 at r4 (raw file):

				result = append(result, opRun{
					op:   m.nameToOp["batch_commit"],
					args: []operand{batch},

I don't want to beat a dead horse too severely, but this is another indication of the problem with the current abstractions. "batch_commit" is referring to an mvccOp and requiring specific arguments. There is no type checking here on the operands. No type checking that you've spelled "batch_commit" correctly. The alternative would be to have something like:

result = append(result, batchCommitOp{
    batch: batch,
 })

Let's leave this change to a follow-on PR, as it will be extensive.


pkg/storage/engine/metamorphic/operations.go, line 529 at r4 (raw file):

				tmpKey := key
				key = endKey
				endKey = tmpKey

The Go idiom for swapping two variables is: key, endKey = endKey, key. No need for tmpKey.


pkg/storage/engine/metamorphic/operations.go, line 563 at r4 (raw file):

			OPERAND_MVCC_KEY,
		},
		weight: 5,

I suspect the weight of this operation should be even lower than this. We don't perform manual compactions very often, usually only after a delete-range.

Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the very extensive review, really appreciated! I've addressed all the simpler points, and noted the major refactor ones for a follow-up PR. I'll update the restarts PR once this is merged.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @petermattis and @sumeerbhola)


pkg/storage/engine/metamorphic/deck.go, line 24 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Only used within this package, so this can be unexported: s/Deck/deck/g.

Done.


pkg/storage/engine/metamorphic/generator.go, line 35 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I suspect there is a level of abstraction here that isn't serving much purpose. Given that there are a fixed number of "*managers", rather than a map you could embed the concrete types here. There are a few places where you're using a variable as a key to managers, but my suspicion is that is due to the operand abstraction which I'm also questioning.

Deferred for a followup PR that refactors operand abstraction.


pkg/storage/engine/metamorphic/generator.go, line 248 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I suspect it would be better for the historic times to be skewed towards the current time. Not sure precisely how I would accomplish that. Perhaps with a zipfian distribution.

Done.


pkg/storage/engine/metamorphic/generator.go, line 100 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: we're unlikely to run more than 2 billion operations. s/uint64/int/g. This also simplifies the loop below slightly.

Done.


pkg/storage/engine/metamorphic/generator.go, line 101 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

You're initializing m.ops to length n, but then seem to be appending to m.ops in runOp. I think the end result is that you'll end up with len(m.ops) == 2 * n. Perhaps you meant to make([]*mvccOp, 0, n)?

Done. Whoops. I was keeping this around for printing output runs in the end, but it's ultimately unnecessary so I got rid of this slice altogether.


pkg/storage/engine/metamorphic/generator.go, line 105 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Are you adding the op somewhere? Seems like you're just running it. Perhaps s/opToAdd/op/g.

Done.


pkg/storage/engine/metamorphic/meta_test.go, line 119 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

By default, I think you should generate a few random seeds. Perhaps make this a command line flag.

Given test runtime can be quite slow, I've resorted to adding 1 fixed seed, 1 user-specified seed, and 1 random seed.


pkg/storage/engine/metamorphic/meta_test.go, line 136 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The seed won't generate the same series of operations if the use of the random number generator changes. Is that a problem?

It shouldn't be an issue. The purpose of having a seeded RNG is to more easily be able to recreate a test run if the output file gets lost for whatever reason (eg. on CI). Presumably if there's a code change, the seed will not exactly match up to the same deterministic set of operations, but that's alright.


pkg/storage/engine/metamorphic/meta_test.go, line 47 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The base.StorageConfig looks identical between this function and createTestRocksDBEngine. Perhaps pull out into a helper: func makeStorageConfig(dir string) base.StorageConfig.

Done.


pkg/storage/engine/metamorphic/meta_test.go, line 109 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Add a TODO about making this configurable. Note that in Pebble, the default is a random number in the range [5000,10000).

Done.


pkg/storage/engine/metamorphic/operands.go, line 30 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: Go style is camelcase for constants. For example, operandTransaction.

Done.


pkg/storage/engine/metamorphic/operands.go, line 58 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Usage of the empty interface is a big signal that something is off with the abstractions. The required use of type assertions is a related indicator. My instinct is that there should be abstraction at the op level, with concrete implementations of the various MVCC operations that know the concrete types of their parameters/operands. For example:

type mvccScanOp struct {
  reader Reader
  start roachpb.Key
  end roachpb.Key
  max int64
  ...
}

One nuance here is that you don't want to store an actual engine.Reader in mvccScanOp. That's why the Pebble metamorphic tests use IDs for all of the objects. The level of indirection allows an op to refer to an object that hasn't been created yet. This is similar to the way you have ops retrieving objects from managers.

Noted. Will address this in a refactor in an upcoming PR.


pkg/storage/engine/metamorphic/operands.go, line 112 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Better to do k.liveKeys[k.rng.Intn(len(k.liveKeys))]. Using floats to generate random numbers in a range like this is actually not uniformly random which is sort of surprising. I don't think it really matters, but it is better to use the supported idiom for generating integers in a range.

Done.


pkg/storage/engine/metamorphic/operands.go, line 49 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Can you provide some commentary around these? get() seems to generate an operand (would generate be a better name?). opener() seems to return the name of an operator.

Done. Get usually does not generate an operand if there already exists one. I say usually because keys are an exception; we want to sometimes reuse keys and other times return new ones. Added commentary about this.


pkg/storage/engine/metamorphic/operands.go, line 178 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/inFlightBatches/openBatches/g?

It's batches that have writes that need to happen before this transaction commits - so maybe "in-flight" makes more sense? The transaction struct itself has an inFlightWrites field.


pkg/storage/engine/metamorphic/operands.go, line 256 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Can you elaborate? Why would the same batch appear multiple times in the in-flight batches list.

At write time, we don't check if this transaction has already written to this batch. We just append it to the end of this slice. It's up to the read case to resolve the dedups. I've clarified it here.


pkg/storage/engine/metamorphic/operations.go, line 32 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The operand abstraction came from an ideal goal of being able to cleanly define what operands depended on others - so if you need a batch and have zero batches, you'll just create one by inserting a batch's opener, and the same for transactions, etc.

I don't think it implies the operand abstraction. I think you primarily get this functionality via the mvccOp.operands specification.

It isn't clear to me if this goal is necessary. The Pebble metamorphic test generator simply skips generating an op if it is applied to an object for which there are no live objects. For example, if you try to generate an iter.Close op and there are no live iterators. Immediately opening and then closing an iterator doesn't seem to be terribly useful.

The hope was also that all state information about active objects would live in their managers only.

I'm good with placing the state information about active objects in the managers. My feeling is that the managers should be a op-generation time component. Currently they are also an op-execution component as well.

In practice it all ended up being more intertwined and less modular than I was hoping, but the general idea still works.

I think it is worthwhile to disentangle the code and state needed for generation from the state needed for executing ops. The current mvccOps structure is good on the generation side, but the way execution is mixed in there is a bit messy. And I think this will get even messier when you add parsing and what not.

Noted. Will address in a follow-up PR.


pkg/storage/engine/metamorphic/operations.go, line 131 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think there are some operations missing here. Can you add a TODO? For example, MVCCConditionalPut, MVCCBlindPut, MVCCMerge, MVCCClearTimeRange. Perhaps you've added these in subsequent PRs.

I'm not seeing ingestion, which is another important bit of surface area to trigger. The way ingestion works in the Pebble metamorphic test is that it converts a batch into an sstable and ingests that. Something similar could be done here. Again, a TODO is sufficient for this PR.

Done.


pkg/storage/engine/metamorphic/operations.go, line 161 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Add a TODO that all of these parameters should be operands.

Done.


pkg/storage/engine/metamorphic/operations.go, line 313 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I don't want to beat a dead horse too severely, but this is another indication of the problem with the current abstractions. "batch_commit" is referring to an mvccOp and requiring specific arguments. There is no type checking here on the operands. No type checking that you've spelled "batch_commit" correctly. The alternative would be to have something like:

result = append(result, batchCommitOp{
    batch: batch,
 })

Let's leave this change to a follow-on PR, as it will be extensive.

Noted for a follow on PR. Thanks!


pkg/storage/engine/metamorphic/operations.go, line 529 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The Go idiom for swapping two variables is: key, endKey = endKey, key. No need for tmpKey.

Done.


pkg/storage/engine/metamorphic/operations.go, line 563 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I suspect the weight of this operation should be even lower than this. We don't perform manual compactions very often, usually only after a delete-range.

Lowered it to 2.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo the comment on the type of inFlightBatches. Definitely want to see the follow-on PRs as I think the clarification will be significant.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)


pkg/storage/engine/metamorphic/generator.go, line 35 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Deferred for a followup PR that refactors operand abstraction.

Ack.


pkg/storage/engine/metamorphic/operands.go, line 58 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Noted. Will address this in a refactor in an upcoming PR.

Ack.


pkg/storage/engine/metamorphic/operands.go, line 178 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

It's batches that have writes that need to happen before this transaction commits - so maybe "in-flight" makes more sense? The transaction struct itself has an inFlightWrites field.

The transaction's in-flight writes refers to network operations (I think).


pkg/storage/engine/metamorphic/operands.go, line 256 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

At write time, we don't check if this transaction has already written to this batch. We just append it to the end of this slice. It's up to the read case to resolve the dedups. I've clarified it here.

Seems like what you want is the set of uncommitted batches per transaction. Could you change the type of inFlightBatches to map[*roachpb.Transaction]map[engine.Batch]struct{}?


pkg/storage/engine/metamorphic/operations.go, line 32 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Noted. Will address in a follow-up PR.

Ack.


pkg/storage/engine/metamorphic/operations.go, line 131 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Done.

I'm not sure if my list was exhaustive. Did you double-check that there aren't other operations?


pkg/storage/engine/metamorphic/operations.go, line 313 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Noted for a follow on PR. Thanks!

Ack.

This PR adds a new test-only sub-package to engine, metamorphic,
which has one test, TestMeta, that generates and runs random MVCC
operations on rocksdb and pebble instances with default settings.

Future additions to this test suite could include:

 - A "check" mode that takes an output file as input, parses it,
   runs the operations in that sequence, and compares output
   strings.
 - Diffing test output between rocksdb and pebble and failing if
   there's a difference
 - Adding support for more operations
 - Adding a "restart" operation that closes the engine and
   restarts a different kind of engine in the store directory,
   then confirming operations after that point generate the
   same output.

First-but-biggest part of cockroachdb#43762 .

Release note: None
When run with the --check <filename> flag, TestMeta will parse
the specified output file from a past TestMeta run for operations.
It will run them in that order, and check for matching output
strings. If there's a mismatch, the test will throw an error.

Will be useful for when this test suite will try swapping engine
types after a restart and will verify if the output stays the same.

Release note: None.
Implements more MVCC operations such as MVCCDelete,
MVCCInconsistent{Get,Scan}.

Also updates batch iterator use to make the behaviour appear the
same between rocksdb and pebble. There are some minor differences
in the implementations: for instance, RocksDB batch iterators don't
support SeekLT, Prev, don't respond well to SeekGEs before lower bounds,
and appear to be SeekGE'd to the lower bound upon initialization.
This branch ensures batch iterators are used so that TestMeta
runs on rocksdb and pebble generate the same output. Important
for when output from one type of TestMeta run is --check ed with
the other engine.

Release note: None.
@itsbilal itsbilal changed the title storage/engine: [WIP] MVCC Metamorphic test suite, first phase storage/engine: MVCC Metamorphic test suite, first phase Feb 5, 2020
Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal, @petermattis, and @sumeerbhola)


pkg/storage/engine/metamorphic/operands.go, line 178 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The transaction's in-flight writes refers to network operations (I think).

Done.


pkg/storage/engine/metamorphic/operands.go, line 256 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Seems like what you want is the set of uncommitted batches per transaction. Could you change the type of inFlightBatches to map[*roachpb.Transaction]map[engine.Batch]struct{}?

Done.


pkg/storage/engine/metamorphic/operations.go, line 131 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I'm not sure if my list was exhaustive. Did you double-check that there aren't other operations?

There are others: MVCCIterate (seems unused), MVCCResolveWriteIntent (which we kinda already call in txn_commit), MVCCFindSplitKey, MVCCIncrement, MVCCInitPut, as well as some variants of Get/Put meant for protos that we can probably ignore. I'll add them too, and say that this isn't an exhaustive list.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal and @sumeerbhola)


pkg/storage/engine/metamorphic/operands.go, line 262 at r8 (raw file):

func (t *txnManager) clearBatch(batch engine.Batch) {
	for _, batches := range t.openBatches {
		for batch2 := range batches {

Is this loop necessary? Can't you blindly delete(batches, batch) here?

Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed linter-reported issues too. TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal, @petermattis, and @sumeerbhola)


pkg/storage/engine/metamorphic/operands.go, line 262 at r8 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Is this loop necessary? Can't you blindly delete(batches, batch) here?

🤦‍♂️ yes, you can. My bad.

@itsbilal itsbilal force-pushed the mvcc-metamorphic-tests branch 2 times, most recently from dc0bb59 to ad8f2d0 Compare February 6, 2020 19:52
itsbilal added a commit to itsbilal/cockroach that referenced this pull request Feb 6, 2020
This PR builds on top of cockroachdb#44458 (all commits before the last one
are from that PR). It adds two things: one, it ensures that whenever
successive engine types are tested, it does a comparison for matching
outputs. This will ensure CI will fail if any change down the line
causes an observed difference in behaviour between rocksdb and pebble.

Furthermore, it adds a new kind of operation: restart. Restarts are
special in that they're added after the specified n number of runs
have happened; so a test run with 3 specified engine types
(so 2 restarts), and n = 10000 will have 3 * 10000 = 30000 operations.
A restart closes all open objects, then closes the engine, and starts
up the next engine in the specified engine sequence. This, combined
with the aforementioned checking functionality across different
engine runs, lets us test for bidirectional compatibility across
different engines.

Fixes cockroachdb#43762 .

Release note: None.
Lots of minor changes such as renaming all operandTypes to
camel casing, TODO insertions wherever necessary, the use
of the pastTS operand type for inconsistent gets/scans, and
the use of rand.Zipf for generating past timestamps.

Release note: None
@itsbilal
Copy link
Member Author

itsbilal commented Feb 6, 2020

bors r+

craig bot pushed a commit that referenced this pull request Feb 6, 2020
44458: storage/engine: MVCC Metamorphic test suite, first phase r=itsbilal a=itsbilal

This PR adds a new test-only sub-package to engine, metamorphic,
which has one test, TestMeta, that generates and runs random MVCC
operations on rocksdb and pebble instances with default settings.

Future additions to this test suite could include:

- [x]  A "check" mode that takes an output file as input, parses it, runs the operations in that sequence, and compares output strings.
- [ ]  Diffing test output between rocksdb and pebble and failing if there's a difference
- [ ]  Adding support for more operations
- [ ]  Adding a "restart" operation that closes the engine and restarts a different kind of engine in the store directory, then confirming operations after that point generate the same output.

First-but-biggest part of #43762 .

Release note: None

44730: storage: fix some tests that were fooling themselves r=andreimatei a=andreimatei

A couple of tests wanted multiple write too old errors, but they were
running at the wrong timestamp and so they were really getting a single
one.
Also add a test showing a funky scenario where 1PC batches behave
differently from non-1PC.

Release note: None

44786: jobs: allow blocking job adoption via sentinel file r=dt a=dt

This change adds a check for a sentinel file, DISABLE_STARTING_BACKGROUND_JOBS,
in the first on-disk store directory to the job adoption loop. If it is found,
jobs will not be adopted by that node and current job executions are cancelled.

Operators can thus touch this file if a misbehaving job is causing problems
and otherwise preventing traditional job control that requires being able to
write to the database.

Release note (general change): background job execution is disabled on nodes where the file DISABLE_STARTING_BACKGROUND_JOBS exists in the first store directory providing an emergency tool for disabling job execution.

Co-authored-by: Bilal Akhtar <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: David Taylor <[email protected]>
@craig
Copy link
Contributor

craig bot commented Feb 6, 2020

Build succeeded

@craig craig bot merged commit b03ca7f into cockroachdb:master Feb 6, 2020
itsbilal added a commit to itsbilal/cockroach that referenced this pull request Feb 7, 2020
This PR builds on top of cockroachdb#44458 (all commits before the last one
are from that PR). It adds two things: one, it ensures that whenever
successive engine types are tested, it does a comparison for matching
outputs. This will ensure CI will fail if any change down the line
causes an observed difference in behaviour between rocksdb and pebble.

It also adds more MVCC operations that would be useful to test, such
as MVCCFindSplitKey, MVCCDeleteRange, MVCCClearTimeRange,
MVCCConditionalPut, etc.

Furthermore, it adds a new kind of operation: restart. Restarts are
special in that they're added after the specified n number of runs
have happened; so a test run with 3 specified engine types
(so 2 restarts), and n = 10000 will have 3 * 10000 = 30000 operations.
A restart closes all open objects, then closes the engine, and starts
up the next engine in the specified engine sequence. This, combined
with the aforementioned checking functionality across different
engine runs, lets us test for bidirectional compatibility across
different engines.

Fixes cockroachdb#43762 .

Release note: None.
itsbilal added a commit to itsbilal/cockroach that referenced this pull request Feb 10, 2020
This PR builds on top of cockroachdb#44458 (all commits before the last one
are from that PR). It adds two things: one, it ensures that whenever
successive engine types are tested, it does a comparison for matching
outputs. This will ensure CI will fail if any change down the line
causes an observed difference in behaviour between rocksdb and pebble.

It also adds more MVCC operations that would be useful to test, such
as MVCCFindSplitKey, MVCCDeleteRange, MVCCClearTimeRange,
MVCCConditionalPut, etc.

Furthermore, it adds a new kind of operation: restart. Restarts are
special in that they're added after the specified n number of runs
have happened; so a test run with 3 specified engine types
(so 2 restarts), and n = 10000 will have 3 * 10000 = 30000 operations.
A restart closes all open objects, then closes the engine, and starts
up the next engine in the specified engine sequence. This, combined
with the aforementioned checking functionality across different
engine runs, lets us test for bidirectional compatibility across
different engines.

Fixes cockroachdb#43762 .

Release note: None.
itsbilal added a commit to itsbilal/cockroach that referenced this pull request Feb 12, 2020
This PR builds on top of cockroachdb#44458 (all commits before the last one
are from that PR). It adds two things: one, it ensures that whenever
successive engine types are tested, it does a comparison for matching
outputs. This will ensure CI will fail if any change down the line
causes an observed difference in behaviour between rocksdb and pebble.

It also adds more MVCC operations that would be useful to test, such
as MVCCFindSplitKey, MVCCDeleteRange, MVCCClearTimeRange,
MVCCConditionalPut, etc.

Furthermore, it adds a new kind of operation: restart. Restarts are
special in that they're added after the specified n number of runs
have happened; so a test run with 3 specified engine types
(so 2 restarts), and n = 10000 will have 3 * 10000 = 30000 operations.
A restart closes all open objects, then closes the engine, and starts
up the next engine in the specified engine sequence. This, combined
with the aforementioned checking functionality across different
engine runs, lets us test for bidirectional compatibility across
different engines.

Fixes cockroachdb#43762 .

Release note: None.
craig bot pushed a commit that referenced this pull request Feb 12, 2020
44630:  engine/metamorphic: Add engine restarts, compare across runs r=itsbilal a=itsbilal

This PR builds on top of #44458 (all commits before the last one
are from that PR). It adds two things: one, it ensures that whenever
successive engine types are tested, it does a comparison for matching
outputs. This will ensure CI will fail if any change down the line
causes an observed difference in behaviour between rocksdb and pebble.

Furthermore, it adds a new kind of operation: restart. Restarts are
special in that they're added after the specified n number of runs
have happened; so a test run with 3 specified engine types
(so 2 restarts), and n = 10000 will have 3 * 10000 = 30000 operations.
A restart closes all open objects, then closes the engine, and starts
up the next engine in the specified engine sequence. This, combined
with the aforementioned checking functionality across different
engine runs, lets us test for bidirectional compatibility across
different engines.

Fixes #43762 .

Release note: None.

Co-authored-by: Bilal Akhtar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants