Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-operator: fix data race in executor #2326

Merged
merged 4 commits into from
Sep 11, 2023
Merged

Conversation

elchead
Copy link
Contributor

@elchead elchead commented Sep 8, 2023

Context

A data race was found during unit tests in the CI pipeline:

188WARNING: DATA RACE
189Read at 0x00c0004fdd70 by goroutine 156:
190 github.com/edgelesssys/constellation/v2/operators/constellation-node-operator/v2/internal/executor.(*Executor).Trigger()
191 operators/constellation-node-operator/internal/executor/executor.go:178 +0x44
192 github.com/edgelesssys/constellation/v2/operators/constellation-node-operator/v2/internal/executor.(*Executor).Trigger-fm()
193 <autogenerated>:1 +0x1d
194 github.com/edgelesssys/constellation/v2/operators/constellation-node-operator/v2/sgreconciler.glob..func1.2.1()
195 operators/constellation-node-operator/sgreconciler/scalinggroup_controller_env_test.go:57 +0x234
196 github.com/onsi/ginkgo/v2/internal.extractBodyFunction.func3()
197 external/com_github_onsi_ginkgo_v2/internal/node.go:463 +0x30
198 github.com/onsi/ginkgo/v2/internal.(*Suite).runNode.func3()
199 external/com_github_onsi_ginkgo_v2/internal/suite.go:865 +0x114
200
201Previous write at 0x00c0004fdd70 by goroutine 153:
202 github.com/edgelesssys/constellation/v2/operators/constellation-node-operator/v2/internal/executor.(*Executor).Start()
203 operators/constellation-node-operator/internal/executor/executor.go:87 +0x1bc
204 github.com/edgelesssys/constellation/v2/operators/constellation-node-operator/v2/sgreconciler.glob..func2.1()
205 operators/constellation-node-operator/sgreconciler/suite_test.go:95 +0x6c
206
207Goroutine 156 (running) created at:
208 github.com/onsi/ginkgo/v2/internal.(*Suite).runNode()
209 external/com_github_onsi_ginkgo_v2/internal/suite.go:852 +0x1359
210 github.com/onsi/ginkgo/v2/internal.(*group).attemptSpec()
211 external/com_github_onsi_ginkgo_v2/internal/group.go:199 +0x1144
212 github.com/onsi/ginkgo/v2/internal.(*group).run()
213 external/com_github_onsi_ginkgo_v2/internal/group.go:346 +0x1147
214 2023-09-05T12:40:43Z INFO reconciling external scaling groups
215 github.com/onsi/ginkgo/v2/internal.(*Suite).runSpecs()
216 external/com_github_onsi_ginkgo_v2/internal/suite.go:465 +0x10ee
217 github.com/onsi/ginkgo/v2/internal.(*Suite).Run()
218 external/com_github_onsi_ginkgo_v2/internal/suite.go:116 +0x615
219 github.com/onsi/ginkgo/v2.RunSpecs()
220 external/com_github_onsi_ginkgo_v2/core_dsl.go:319 +0x1724
221 github.com/edgelesssys/constellation/v2/operators/constellation-node-operator/v2/sgreconciler.TestAPIs()
222
223(12:40:46) INFO: From Testing //operators/constellation-node-operator/sgreconciler:sgreconciler_test:
224(12:40:46) [6,022 / 7,767] 26 / 128 tests, 1 failed; 7 actions, 6 running; last test: .../sgreconciler:sgreconciler_test
225 Testing //operators/constellation-node-operator/sgreconciler:sgreconciler_test; 13s remote-cache, processwrapper-sandbox
226 GoLink .../measurement-generator/measurement-generator_test_/measurement-generator_test; 13s remote-cache, processwrapper-sandbox
227 GoLink measurement-reader/internal/sorted/sorted_test_/sorted_test; 9s remote-cache, processwrapper-sandbox
228 GoCompilePkg external/com_github_go_playground_validator_v10/validator.a; Downloading external/com_github_go_playgrou...; 0s remote-cache
229 operators/constellation-node-operator/sgreconciler/suite_test.go:52 +0x245
230 testing.tRunner()
231 GOROOT/src/testing/testing.go:1576 +0x216
232 testing.(*T).Run.func1()
233 GOROOT/src/testing/testing.go:1629 +0x47
234
235Goroutine 153 (finished) created at:
236 github.com/edgelesssys/constellation/v2/operators/constellation-node-operator/v2/sgreconciler.glob..func2()
237 operators/constellation-node-operator/sgreconciler/suite_test.go:93 +0xf44
238 github.com/onsi/ginkgo/v2/internal.extractBodyFunction.func3()
239 external/com_github_onsi_ginkgo_v2/internal/node.go:463 +0x30
240 github.com/onsi/ginkgo/v2/internal.(*Suite).runNode.func3()
241 external/com_github_onsi_ginkgo_v2/internal/suite.go:865 +0x114

https://github.com/edgelesssys/constellation/actions/runs/6084678042/job/16507108767?pr=2287#step:7:189

Concurrent access of channels is safe, but the initialization of the channel itself is not
(see ref to operators/constellation-node-operator/internal/executor/executor.go:87 +0x1bc).

Proposed change(s)

  • move the channel initialization to the constructor
  • enforce usage of the constructor for correct initialization

Checklist

  • Update docs
  • Add labels (e.g., for changelog category)
  • Is PR title adequate for changelog?
  • Link to Milestone

@netlify
Copy link

netlify bot commented Sep 8, 2023

Deploy Preview for constellation-docs ready!

Name Link
🔨 Latest commit 890d73e
🔍 Latest deploy log https://app.netlify.com/sites/constellation-docs/deploys/64fc7a361c9d9b0008efcca5
😎 Deploy Preview https://deploy-preview-2326--constellation-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@elchead elchead changed the title node-operator: fix datarace in executor node-operator: fix data race in executor Sep 8, 2023
@elchead elchead added the bug fix Fixing a bug label Sep 8, 2023
@elchead elchead marked this pull request as ready for review September 8, 2023 14:29
@elchead elchead requested a review from malt3 as a code owner September 8, 2023 14:29
@elchead elchead added this to the v2.11.0 milestone Sep 8, 2023
@malt3
Copy link
Contributor

malt3 commented Sep 8, 2023

The change looks good to me.
There is actually a atomic bool that should guard the Start method.

Small nit: Can you also remove the e.running.Store(false)? This way it's actually impossible to call Start twice.

@elchead
Copy link
Contributor Author

elchead commented Sep 8, 2023

There is actually a atomic bool that should guard the Start method.

It guards starting concurrently, but it doesn't guard against (Start + Trigger). That was apparently the problem due to the channel initialization

@elchead
Copy link
Contributor Author

elchead commented Sep 9, 2023

@malt3 there is a problem when remove the running = false to prevent starting the controller twice in the TestContextCancel (since it keeps running forever then). afais, the changed code also doesn't let Start() fail when running it a 2nd time. I moved the defer code back in. Is that ok?

@elchead elchead merged commit b3bb486 into main Sep 11, 2023
9 checks passed
@elchead elchead deleted the fix/node-operator/data-race branch September 11, 2023 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix Fixing a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants