You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At present, the Argo CD unit tests run as part of every CI build, but they do not run with with -race parameter which may be used to detect data races.
Since data races, in production applications, may lead to application instability and crashes, it's beneficial to ensure that we prevent data races from being introduced into Argo CD where possible. An easy way of improving our coverage here is what I propose below.
I've already fixed/opened PRs for a number of existing data races:
While some of them have existed for a while, a number of them were introduced fairly recently, and could have been caught if we were running the unit tests with -race as part of PR CI builds.
Proposal
I'm proposing:
Add a new make test-race entry to the makefile, to run the (supported) tests with -race
Segregate tests that currently fail with -race enabled, such that they do not run during make test-race, and ONLY run during make test
Failing tests are seggregated by moving them into a new test file, suffixed with _norace_test.go
Q: Are all the tests passing with data race detection enabled?
Unfortunately there are a few tests that are still throwing data races during unit tests, and thus I have moved them into _norace_test.go files so that they don't run during the -race tess pass. I've filed issues for a few, and PRs for the ones mentioned above... here are some examples:
TestUserAgent/Test_StaticHeaders in server/server_test.go
A data race in go-client's shared_informer.go, between sharedProcessor.run(...) and itself. Based on the data race, it APPEARS to be expected, but in any case it's nothing we are doing in Argo CD AFAICT that is causing this issue.
TestRandomPasswordVerificationDelay in util/session/sessionmanager_test.go
SessionManager.VerifyUsernamePassword uses bcrypt to prevent against time-based attacks and verify the hashed password; however this is a CPU intensive algorithm that is made significantly slower due to data race detection being enabled, which breaks through the maximum time limit required by TestRandomPasswordVerificationDelay.
TestResourceActionWildcards/TestPolicyInformer in util/rbac/rbac_test.go
A BUNCH of data race warnings thrown by running this test package, and it's tough to guess to what degree this is primarily a casbin issue or an Argo CD RBAC issue... at least one data race is in rbac.go with itself, and a bunch are in casbin. You can see the full list by doing a go test -race github.com/argoproj/argo-cd/util/rbac
It couldn't hurt to take a look at this code to decide if Argo CD is properly handling concurrent data access here, but in the mean time I have disabled data race testing of these tests.
Q: What if it makes the build unreliable?
I've stress tested the unit tests by running go test -race in a continous loop, and any tests that failed due to a race condition during the process were moved to _norace_ files. (with the only exception being those couple of test packages that fail when run too closely together, due to to github repo request limits)
I've likewise hit the GitHub 'rerun action' button a bunch of times on the newly introduced go test -race job, to reduce the potential for intermittent issues only reproducible from within GitHub internal runners.
If we do discover it makes the build unreliable, we should definitely segregate those affected tests, but as above I'm not expecting many issues there.
The text was updated successfully, but these errors were encountered:
#4774) (#4775)
* chore: Add a GitHub action that runs unit tests with -race to CI build (#4774)
Signed-off-by: Jonathan West <[email protected]>
* chore: Add a GitHub action that runs unit tests with -race to CI build (#4774)
Signed-off-by: Jonathan West <[email protected]>
Summary
At present, the Argo CD unit tests run as part of every CI build, but they do not run with with
-race
parameter which may be used to detect data races.Since data races, in production applications, may lead to application instability and crashes, it's beneficial to ensure that we prevent data races from being introduced into Argo CD where possible. An easy way of improving our coverage here is what I propose below.
I've already fixed/opened PRs for a number of existing data races:
While some of them have existed for a while, a number of them were introduced fairly recently, and could have been caught if we were running the unit tests with
-race
as part of PR CI builds.Proposal
I'm proposing:
make test-race
entry to the makefile, to run the (supported) tests with-race
-race
enabled, such that they do not run duringmake test-race
, and ONLY run duringmake test
_norace_test.go
// +build !race
at the top of the source file which prevents them from being run when-race
is enabled.go-test-race
job to theci-build.yaml
as a check to run on every PR.See the corresponding PR for this issue, as an example of how all of the above looks in practice.
Feedback definitely welcome!
Additional Questions Answered
Q: Are all the tests passing with data race detection enabled?
Unfortunately there are a few tests that are still throwing data races during unit tests, and thus I have moved them into
_norace_test.go
files so that they don't run during the-race
tess pass. I've filed issues for a few, and PRs for the ones mentioned above... here are some examples:TestUserAgent/Test_StaticHeaders in server/server_test.go
A data race in go-client's
shared_informer.go
, betweensharedProcessor.run(...)
and itself. Based on the data race, it APPEARS to be expected, but in any case it's nothing we are doing in Argo CD AFAICT that is causing this issue.TestRandomPasswordVerificationDelay in util/session/sessionmanager_test.go
SessionManager.VerifyUsernamePassword
uses bcrypt to prevent against time-based attacks and verify the hashed password; however this is a CPU intensive algorithm that is made significantly slower due to data race detection being enabled, which breaks through the maximum time limit required byTestRandomPasswordVerificationDelay
.TestResourceActionWildcards/TestPolicyInformer in util/rbac/rbac_test.go
A BUNCH of data race warnings thrown by running this test package, and it's tough to guess to what degree this is primarily a casbin issue or an Argo CD RBAC issue... at least one data race is in
rbac.go
with itself, and a bunch are in casbin. You can see the full list by doing ago test -race github.com/argoproj/argo-cd/util/rbac
It couldn't hurt to take a look at this code to decide if Argo CD is properly handling concurrent data access here, but in the mean time I have disabled data race testing of these tests.
Q: What if it makes the build unreliable?
I've stress tested the unit tests by running
go test -race
in a continous loop, and any tests that failed due to a race condition during the process were moved to_norace_
files. (with the only exception being those couple of test packages that fail when run too closely together, due to to github repo request limits)I've likewise hit the GitHub 'rerun action' button a bunch of times on the newly introduced
go test -race
job, to reduce the potential for intermittent issues only reproducible from within GitHub internal runners.If we do discover it makes the build unreliable, we should definitely segregate those affected tests, but as above I'm not expecting many issues there.
The text was updated successfully, but these errors were encountered: