Skip to content
This repository has been archived by the owner on May 24, 2022. It is now read-only.

[test_runner] Simplify routing block #109

Merged
merged 1 commit into from
Jul 29, 2020
Merged

[test_runner] Simplify routing block #109

merged 1 commit into from
Jul 29, 2020

Conversation

marcoguerri
Copy link
Contributor

After extracting the pipeline logic, the routing logic was still part
of the pipeline and very monolithic. This patch extracts the routing
logic into a dedicated structure and splits it into routeIn logic
and routeOut logic, which makes it unit-testable.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 4, 2020
@marcoguerri
Copy link
Contributor Author

Depends on #107 .

@codecov
Copy link

codecov bot commented Jun 4, 2020

Codecov Report

Merging #109 into master will increase coverage by 0.34%.
The diff coverage is 80.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #109      +/-   ##
==========================================
+ Coverage   64.85%   65.20%   +0.34%     
==========================================
  Files          51       51              
  Lines        2988     3035      +47     
==========================================
+ Hits         1938     1979      +41     
- Misses        810      812       +2     
- Partials      240      244       +4     
Flag Coverage Δ
#integration 63.02% <80.52%> (+0.38%) ⬆️
#unittests 17.90% <0.00%> (-0.57%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/runner/test_runner.go 82.21% <80.52%> (+0.25%) ⬆️
plugins/teststeps/example/example.go 45.23% <0.00%> (+2.38%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 18a383e...711a82b. Read the comment docs.

Copy link
Contributor

@xaionaro xaionaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In total:

The only problem that bothers me related to: close(r.routingChannels.routeOut). So I'm asking for changes only because of that. The rest looks good: the code looks much clearer (than it was before) personally for my eyes. Now at least I so-so understood the high-level logic of what is going on :)


Since this PR is intended to improve code readability, I've made a lot of subjective styling comments, of course they are not expected to be addressed at all :)

In particular, once were talked about extra nesting levels which makes the code much harder for me, so I left few comments dedicated to this problem. I understand that somebody else could have the opposite perception, so I do not expect any reaction on those.

pkg/runner/test_runner.go Outdated Show resolved Hide resolved
pkg/runner/test_runner.go Show resolved Hide resolved
pkg/runner/test_runner.go Outdated Show resolved Hide resolved
pkg/runner/test_runner.go Outdated Show resolved Hide resolved
Comment on lines +244 to +245
if routeInProgress {
continue
}
Copy link
Contributor

@xaionaro xaionaro Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took from me few seconds of hard brain work to understand the purpose of this continue. And if I understood correctly variable routeInProgress is just not needed: why just not to put the continue directly to case t, chanIsOpen := <-r.routingChannels.routeIn:? Is it to avoid occasional mistake when a programmer assigns err in that case and expects for to finish? If so, then IMHO the problem is caused by:

		if err != nil {
			break
		}

above, which could be avoided by moving the cleanup routines to a defer (instead of bottom of this function), so you'll be able to just do:

return fmt.Errorf("my error")

instead of:

	err = fmt.Errorf("my error")
	...
if err != nil {
	break
}

Copy link
Contributor Author

@marcoguerri marcoguerri Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we receive something from routeIn, we might not want to continue, and instead proceed with injection of the target that we just received, if there is no other target pending. We cannot start an injection only if another one finished, as will halt the pipeline if the buffer is emptied, and then another target comes in (and we have a continue clause).We still need a way to tell whether there is already a goroutine pending for the injection. Then, we need to check this status in two situations:

  1. We receive another target
  2. We finish injecting a target

In case of 1), we need to check if another injection is pending. In case 2) we know that routeInProgress needs to be reset, there is nothing being injected.
I don't think I have found a clean way to get rid of that boolean flag.

The defer approach looks fine, I can amend in that direction, but I don't think there is an easier way to do what that flag is doing.

Copy link
Contributor

@xaionaro xaionaro Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me a too difficult logic for a too simple task.

If I understand correctly, the task is: we want to forward targets from r.routingChannels.routeIn to r.routingChannels.stepIn without queueing (to r.routingChannels.routeIn). With timeouts and stuff, but this is currently non relevant :)

Few questions about the current code:

  1. Why do we need targets list at all? Isn't it just a duplication of channels logic (of routeIn in this case)? (which could queue things)? Do we do this logic because we cannot dynamically control the size of channel routeIn?
  2. Couldn't we just do something like this? -- https://play.golang.org/p/Kvki6Z2RNHm
  3. Is it possible to cover expected behavior of this function with unit-tests? I'll be able to propose my-own implementation of the function :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if comments to the code into the link above will help. I'll elaborate :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay, I had taken notes that I needed to reply to this but got around doing so only now :(

If I understand correctly, the task is: we want to forward targets from r.routingChannels.routeIn to r.routingChannels.stepIn without queueing (to r.routingChannels.routeIn).

Yes, this is the intention.

  1. We need a target list to be used as buffer to accept as soon as possible incoming targets. We could use a channel, but we would need that channel to automatically resize. We don't know beforehand the number of targets that we might want to enqueue. Proposal in 2. assumes a buffer size of 10. Which size should we set here?
  2. In this unit tests for this control path have already been added in this PR. In particular, TestRouteInRoutesAllTargets tests that all targets incoming from routeIn, go out from stepIn (where the unit test is consuming targets). However, here your solution would pass this unit test. What I think is important is that anything writing to routeIn never blocks (because on the other side somebody is always reading on the channel). This was one of the assumptions when designing the pipeline, and the reason is that I wanted for targets to be taken into charge from plugins as soon as possible. So if a plugin returns a target, the routing block needs to accept it without making the test step block, under no circumstance. This is made possible by having that list, which acts as a buffer. With a channel with fixed size, we might end up in a situation were we are blocked on channel write. And if this is the contract with the plugins, I think it might make things harder to debug when things timeout or we seem to occur into a deadlock.

Copy link
Contributor

@xaionaro xaionaro Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know beforehand the number of targets that we might want to enqueue. Proposal in 2. assumes a buffer size of 10. Which size should we set here?

  1. We do know how may target we have acquired.
  2. Even if in the proposal you remove , 10 from make(chan it still will work correctly :)

With a channel with fixed size, we might end up in a situation were we are blocked on channel write

This only means that nobody wants to receive anything from that channel, yet (just like if you remove , 10 in the proposal). It is nothing really solved by creating a buffer: you just move the queue from the initial point to another one.


But I think I understood what you mean:
A pipeline consists of test steps, and we want to:

  • Make logic of test steps as simple as possible.
  • Finish a step as soon as possible.

Thus we want to read all targets from the previous step as soon as possible and push it to next step on its convenience. So we have to create this temporary buffer or predict the queue size (to create channels of the required size). My thoughts here:

  1. I do not understand why it is important to finish a step as soon as possible. For neater logging? I'm not sure if that justifies this logic difficulty in a so responsible part of the code.
  2. More importantly, we know the amount of targets returned by the target manager and we could've use this.
  3. Even if this is not possible, I still have a feeling it could be simplified. I could try to write an example using specific unit tests :)

But I think:

  • It does not worth so thorough discussion
  • Highly likely I just miss something.

So let's just move on and leave it as is :)

Copy link
Contributor Author

@marcoguerri marcoguerri Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick comment:

More importantly, we know the amount of targets returned by the target manager and we could've use this.

There are two complications:

  • Target might fail while flowing through the pipeline, so we know only the maximum number
  • Pipelines can be attached back to back. A second pipeline does not know how many target will flow through it until the first pipeline has completed, so in this case we really need to wait a signal from outside telling us that no more targets will come through.

I could use the upper bound in both cases, probably it's a better tradeoff.

Let me propose this: I'll write a solution with using the maximum target number that could ever flow through the pipeline. And we discuss in a different PR if that is a good enough tradeoff.

Though, the complex part comes from the code which reads the list and injects in to the next step. Actually, the background goroutine which I think was the approach you were trying to propose in the playground should work. I'll need locking if I don't use channels (and I don't want to size channels to the upper bound), but that should work. Lots of time has passed, so this is a bit blurry in my mind. I'll take note and give it a shot when I have some spare cycles.

pkg/runner/test_runner.go Show resolved Hide resolved
pkg/runner/test_runner.go Outdated Show resolved Hide resolved
pkg/runner/test_runner.go Outdated Show resolved Hide resolved
pkg/runner/test_runner.go Show resolved Hide resolved
pkg/runner/test_runner.go Show resolved Hide resolved
@xaionaro
Copy link
Contributor

xaionaro commented Jul 28, 2020

I'll write answers tomorrow.

Heh. I see you missed my comments which we folded/hidden by GitHub into button "9 hidden conversations" :)

@marcoguerri
Copy link
Contributor Author

I addressed the 9 missing ones later, I was going through the comments but I had to stop for dinner as people were waiting on me. Those resolved were addressed. Those not resolved are still pending and might require additional conversations. Push incoming.

@xaionaro
Copy link
Contributor

OK, I see you've answered to the problem I was concerned about (close(r.routingChannels.routeOut)). Thank you for considering the proposed modifications. I will answer to your replies tomorrow, but meanwhile in my opinion we can merge this. Even if while discussion something will popup, I think it will be a minor thing and could be addressed afterwards.

@marcoguerri
Copy link
Contributor Author

Let's sync tomorrow on what is missing and make sure we are aligned on expectations. It has been up for 2 months, can wait one more day 😃

@xaionaro
Copy link
Contributor

Travis says:

pkg/runner/test_runner.go:220:6: printf: Warningf call has error-wrapping directive %w (govet)
					log.Warningf("could not emit %v event for target %+v: %w", targetInErrEv, *injectionResult.target, err)
					^
pkg/runner/test_runner.go:339:5: printf: Warningf call has error-wrapping directive %w (govet)
				log.Warningf("could not emit out event for target %v: %w", *t, err)

@marcoguerri
Copy link
Contributor Author

marcoguerri commented Jul 29, 2020

Fixed, but I am still not aligned in terms of which linters we run in the CI and which linters I run in my dev env, so I'll need to fix that as well.

EDIT: now I also remember the reason behind the atomic operations. The race detector flags is as a data race. I will restore atomic operations, as I mentioned above, that logic has been completely changed in upcoming PRs.

After extracting the pipeline logic, the routing logic was still part
of the pipeline and very monolithic. This patch extracts the routing
logic into a dedicated structure and splits it into routeIn logic
and routeOut logic, which makes it unit-testable.
@marcoguerri marcoguerri merged commit 82708b5 into facebookarchive:master Jul 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants