Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Migrate P2P/state tests to new framework #1160

Merged
merged 17 commits into from
Mar 13, 2023

Conversation

AndrewSisley
Copy link
Contributor

@AndrewSisley AndrewSisley commented Mar 6, 2023

Relevant issue(s)

Resolves #1153

Description

I need to be able to do some quite complicated testing with P2P and Schema Updates, and so have merged the P2P/state tests into the primary integration test framework. I think it is quite a nice change regardless of the immediate need too.

As quite a lot has changed in this PR, and a lot moved - I think similar to the previous integration test rework PR the code that has been moved should be reviewed as if it was brand new - it is a good opportunity to further clean it up, and it is probably a waste of everyone's energy to try differentiate between what is new/re-written and what has just been moved.

The additional testing to be done as part of Schema Updates is not in scope for this PR. The tests remain the same, minus a single test that was removed (it was a sanity check for the old framework).

Strongly recommend reviewing commit by commit, and reading the full commit message. This got a bit fiddly at points and the commits are perhaps not as clean/clear cut as I'd like, but still much better than just trying to read the whole thing in one go - sorry about that.

Also fixes a previously undetected P2P race condition between node Start and SetReplicator. And temporarily removes some error magic that does not appear to quite be working (was not caught previously as P2P tests only ran on badger).

Will be running the test action a bunch of times whilst in review to see if there is any flakiness left - I am happy tolerating some if found, as the current framework is slightly flaky and I see fully solving that as out of scope, but I definitely do not wish to make it worse. I ran them 20 or so times (CI) without a failure, I think all flakiness may finally be fixed. #1154 is still an issue, got a single failure out of 50 runs. Will not fix in this PR.

@AndrewSisley AndrewSisley added bug Something isn't working area/testing Related to any test or testing suite refactor This issue specific to or requires *notable* refactoring of existing codebases and components area/p2p Related to the p2p networking system action/no-benchmark Skips the action that runs the benchmark. labels Mar 6, 2023
@AndrewSisley AndrewSisley added this to the DefraDB v0.5 milestone Mar 6, 2023
@AndrewSisley AndrewSisley requested a review from a team March 6, 2023 20:21
@AndrewSisley AndrewSisley self-assigned this Mar 6, 2023
@jsimnz
Copy link
Member

jsimnz commented Mar 6, 2023

praise: Thanks for removing the whole dbInfo abstraction!

@AndrewSisley AndrewSisley force-pushed the sisley/test/I1153-mv-p2p-core-framework branch 2 times, most recently from bd7ca96 to 07d7e68 Compare March 7, 2023 17:03
@codecov
Copy link

codecov bot commented Mar 7, 2023

Codecov Report

❗ No coverage uploaded for pull request base (develop@31654fa). Click here to learn what that means.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop    #1160   +/-   ##
==========================================
  Coverage           ?   68.58%           
==========================================
  Files              ?      180           
  Lines              ?    17012           
  Branches           ?        0           
==========================================
  Hits               ?    11668           
  Misses             ?     4388           
  Partials           ?      956           
Impacted Files Coverage Δ
net/peer.go 46.29% <100.00%> (ø)

@AndrewSisley AndrewSisley force-pushed the sisley/test/I1153-mv-p2p-core-framework branch 2 times, most recently from 0a01528 to 0dfbd90 Compare March 8, 2023 00:37
Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the change overall and I think it makes it much easier to read the test cases. I do have a few comments that should be addressed before I can approve :)

net/peer.go Show resolved Hide resolved
tests/integration/events/utils.go Show resolved Hide resolved
tests/integration/p2p.go Show resolved Hide resolved
Comment on lines 52 to 53
// All document changes made in the source node will be synced to the target node.
// Changes made in the target node will not be synced to the source node.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Changes in the target will be synced if it‘s an update to a document that is on both nodes. So a new document in the source will sync to the target and an update to that document on the target will sync to the source. Maybe the comment should be updated to represent that.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers Fred - will change. I think we may have a test gap there though (might add a ticket).

  • Fix ConfigureReplicator docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test and updated the docs, it required a minor rejig of the wait mechanics (PR FIXUP - Document replicator target to source sync).

Comment on lines 64 to 65
// Any peers connected to this node in the network will have all document changes synced to
// them from this node.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: That is an incorrect description of the behaviour. It should say that any changes to documents within the collection on any network nodes will sync to this node.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers! I think we may have a test gap here too, as I think I remember being unsure about this in particular and couldn't find any tests that demonstrated this distinction (I think they may all be one-one node tests, where the result appears on all nodes present).

  • Fix documentation, maybe add ticket for test gap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have tests for this, I missed them. Docs have been updated


// The `n.Peer.AddP2PCollections(colIDs)` call above is calling some asynchronous functions
// for the pubsub subscription and those functions can take a bit of time to complete.
// Since it is possible to get to the create and update functions bellow before the subscription
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Remove bellow as it's no longer accurate.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, thanks for flagging...

  • Fix out of date comment

Comment on lines 287 to 288
for _, db := range dbs {
executeTestCase(ctx, t, collectionNames, testCase, db)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Since the parameter name is dbt, maybe it would be more appropriate to use dbt here as well instead of db which can be confused with a client.DB type.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will change

  • rename db=> dbt var

Comment on lines 87 to 88
// This document is created in all nodes before the replicator is set up.
// Updates should be synced across nodes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: in node 1 instead of in all nodes.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 fair enough, will change

  • tweak comment

Comment on lines 224 to 240
if action.NodeID.HasValue() && action.NodeID.Value() == cfg.TargetNodeID {
targetToSourceEvents += 1
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: This situation will not result in a sync so we shouldn't wait. Currently it will cause the test to timeout if we try to test that a create on the target will not sync to the source.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will test and get back to you. I spent an annoying amount of time getting these if block/waits correct, but that time was also polluted with the bug fixed in this PR and I trust you here more than I trust me.

  • Try and remove, if tests fail check in with Fred

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestP2PWithMultipleDocumentUpdatesPerNode failed on first test run without this line (targetToSourceEvents += 1). I think either you may be mistaken (possibly due to production bug), or there is another bug somewhere in the test framework (perhaps unlikely given the nature of this line/failure).

Leaving as is for now, but it might be worth more of your attention as I am not so sure what the correct P2P behaviour should be, only what it appears to actually be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually that failure makes no sense, as removing that line shouldn't increase the chances of a time out, only increase the chances of checking the results prematurely - likely an unhappy coincidence, and suggests that there may still be some flakiness. Will investigate further before having another look at this line again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like failure was just #1154 which I thought had been fixed. Removing this if block - thanks Fred

Comment on lines 232 to 252
if action.NodeID.HasValue() && action.NodeID.Value() == cfg.TargetNodeID {
targetToSourceEvents += 1
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: This will only be relevant if the document was previously created on both source and target. We should have a way to verify that before committing to wait for a sync otherwise the test will timeout.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a test gap here then - will try and sort out both gap and this code in this PR.

  • Add test, verify (framework) bug, fix bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On revisit, I think you may be incorrect here as there is no way for the document to not exist in all nodes.

If a document didnt exist everywhere prior to replicator configuration, the replicator would sync it upon configuration. I have just added a test for this (PR FIXUP - Add test for replicator syncing exising doc).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I'm pretty sure I'm correct :)

A replicator is a one way on create. If node 1 replicates to node 2, new docs on node 1 will be copied to node 2. But new docs on node 2 will not replicate to node 1.

However, updating a documents on node 2 that was either seeded to both nodes or created on node 1 will also result in that document being updated on node 1. But a document updated on node 2 that was created on node 2 will not be updated on node 1.

Copy link
Contributor Author

@AndrewSisley AndrewSisley Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😁 Of course lol, was only thinking for creates on the source. Cheers

  • Add test, probably make changes to framework

Flagged by go test -race
It does not work with errors.Is, which is kind of important and breaks P2Ps transaction rentry code
Is an unnessecary complication that makes future changes harder
databaseType will be used more in later commits.  It can stay internal and hidden behind the `Name()` function for the explain tests for now.
We need to be able to initialize database instances from within an action, but they should still be using the correct datastore type! We also require 1-many database instances for a given test (one per node).  This commit enables that, and creates a single 'normal' node if no nodes have been explicitly requested.

The explain utils starts to get in the way a little bit here, and will be properly separated shortly in another commit.
databaseInfo and a handful of utils functions are now only used by the explain tests and represent an annoying code complication and oddity for the main test suite.  This commit removes what it can, and moves the rest into the explain test utils.
Subscriptions, transactional queries and the change detector are currently out of scope, although I see no reason as to why they could not be made to work later.

Actually using these new actions has been split out to another commit in order to keep this on focused, it may however be handy to have a look at the two together to get an idea as to what it looks like to use it.
Does not change the nature of the tests, although I do remember deleting a single test that acted as a sanity check for the old framework.

Does not make use of localised utils.go files in order to hide the schema - whilst they are consistent across a directory I am currently prefering the explicitness here given how complex these tests are.  Do let me know though.
Behaviour does not exist, no need to wait.
Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the changes.

@AndrewSisley AndrewSisley merged commit 3979fa7 into develop Mar 13, 2023
@AndrewSisley AndrewSisley deleted the sisley/test/I1153-mv-p2p-core-framework branch March 13, 2023 22:50
shahzadlone pushed a commit that referenced this pull request Apr 13, 2023
* Remove replicator race condition

Flagged by go test -race

* Remove Defra-IM txn error fanciness

It does not work with errors.Is, which is kind of important and breaks P2Ps transaction rentry code

* Remove dbInfo.Rootstore func

Is an unnessecary complication that makes future changes harder

* Extract datastore strings to consts

databaseType will be used more in later commits.  It can stay internal and hidden behind the `Name()` function for the explain tests for now.

* Move control of db instance to action handler

We need to be able to initialize database instances from within an action, but they should still be using the correct datastore type! We also require 1-many database instances for a given test (one per node).  This commit enables that, and creates a single 'normal' node if no nodes have been explicitly requested.

The explain utils starts to get in the way a little bit here, and will be properly separated shortly in another commit.

* Remove databaseInfo from main test utils

databaseInfo and a handful of utils functions are now only used by the explain tests and represent an annoying code complication and oddity for the main test suite.  This commit removes what it can, and moves the rest into the explain test utils.

* Add P2P capacity to core test framework

Subscriptions, transactional queries and the change detector are currently out of scope, although I see no reason as to why they could not be made to work later.

Actually using these new actions has been split out to another commit in order to keep this on focused, it may however be handy to have a look at the two together to get an idea as to what it looks like to use it.

* Convert existing P2P tests to new system

Does not change the nature of the tests, although I do remember deleting a single test that acted as a sanity check for the old framework.

Does not make use of localised utils.go files in order to hide the schema - whilst they are consistent across a directory I am currently prefering the explicitness here given how complex these tests are.  Do let me know though.

* Remove old P2P/state test framework
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
* Remove replicator race condition

Flagged by go test -race

* Remove Defra-IM txn error fanciness

It does not work with errors.Is, which is kind of important and breaks P2Ps transaction rentry code

* Remove dbInfo.Rootstore func

Is an unnessecary complication that makes future changes harder

* Extract datastore strings to consts

databaseType will be used more in later commits.  It can stay internal and hidden behind the `Name()` function for the explain tests for now.

* Move control of db instance to action handler

We need to be able to initialize database instances from within an action, but they should still be using the correct datastore type! We also require 1-many database instances for a given test (one per node).  This commit enables that, and creates a single 'normal' node if no nodes have been explicitly requested.

The explain utils starts to get in the way a little bit here, and will be properly separated shortly in another commit.

* Remove databaseInfo from main test utils

databaseInfo and a handful of utils functions are now only used by the explain tests and represent an annoying code complication and oddity for the main test suite.  This commit removes what it can, and moves the rest into the explain test utils.

* Add P2P capacity to core test framework

Subscriptions, transactional queries and the change detector are currently out of scope, although I see no reason as to why they could not be made to work later.

Actually using these new actions has been split out to another commit in order to keep this on focused, it may however be handy to have a look at the two together to get an idea as to what it looks like to use it.

* Convert existing P2P tests to new system

Does not change the nature of the tests, although I do remember deleting a single test that acted as a sanity check for the old framework.

Does not make use of localised utils.go files in order to hide the schema - whilst they are consistent across a directory I am currently prefering the explicitness here given how complex these tests are.  Do let me know though.

* Remove old P2P/state test framework
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/p2p Related to the p2p networking system area/testing Related to any test or testing suite bug Something isn't working refactor This issue specific to or requires *notable* refactoring of existing codebases and components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate P2P state tests to new framework
3 participants