Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pool: add pool raciness tests, fix p.usedTotal house-keeping #1340

Merged
merged 8 commits into from
Jul 23, 2019

Conversation

GiedriusS
Copy link
Member

@GiedriusS GiedriusS commented Jul 19, 2019

I have noticed with these tests that the p.usedTotal value keeps
jumping like this:

6
3
6
6
18446744073709551604
3
18446744073709551607
18446744073709551604

It could happen that the value of it would underflow as the tests show:

unexpected error: goroutine barbazbaz: pool exhausted

This happens when a caller does like a Get(3) and then makes the slice larger by appending more items to it. Then, when returning a slice we subtract its capacity from p.usedTotal and that can underflow. Also, that can happen because there is no synchronization between different users.

Furthermore, lets fix the house-keeping of the p.usedTotal completely by not subtracting the p.usedTotal from itself whenever Put was called. I am not sure that it was ever working correctly.

Also, go test -race reports without these changes:

==================
WARNING: DATA RACE
Read at 0x00c0000c0538 by goroutine 11:
  github.com/improbable-eng/thanos/pkg/pool.(*BytesPool).Put()
      /home/gstatkevicius/dev/thanos/pkg/pool/pool.go:93 +0x152
  github.com/improbable-eng/thanos/pkg/pool.TestRacePutGet.func1()
      /home/gstatkevicius/dev/thanos/pkg/pool/pool_test.go:103 +0x9a

Previous write at 0x00c0000c0538 by goroutine 12:
  sync/atomic.AddInt64()
      /usr/local/go/src/runtime/race_amd64.s:276 +0xb
  github.com/improbable-eng/thanos/pkg/pool.(*BytesPool).Get()
      /home/gstatkevicius/dev/thanos/pkg/pool/pool.go:71 +0x186
  github.com/improbable-eng/thanos/pkg/pool.TestRacePutGet.func1()
      /home/gstatkevicius/dev/thanos/pkg/pool/pool_test.go:80 +0xe4

Goroutine 11 (running) created at:
  github.com/improbable-eng/thanos/pkg/pool.TestRacePutGet()
      /home/gstatkevicius/dev/thanos/pkg/pool/pool_test.go:109 +0x2ca
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:865 +0x163

Goroutine 12 (running) created at:
  github.com/improbable-eng/thanos/pkg/pool.TestRacePutGet()
      /home/gstatkevicius/dev/thanos/pkg/pool/pool_test.go:110 +0x2f5
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:865 +0x163
==================

After these changes, the race detector doesn't complain anymore.

I have noticed with these tests that the `p.usedTotal` value keeps
jumping like this:

6
3
6
6
18446744073709551604
3
18446744073709551607
18446744073709551604

It could happen that the value of it would underflow as the tests show:

```
unexpected error: goroutine barbazbaz: pool exhausted
```
@GiedriusS GiedriusS marked this pull request as ready for review July 19, 2019 10:38
@GiedriusS GiedriusS changed the title pool: add pool raciness tests, fix p.usedTotal keeping pool: add pool raciness tests, fix p.usedTotal house-keeping Jul 19, 2019
@GiedriusS
Copy link
Member Author

Tests work well on my laptop but fail on Circle CI. Need to investigate 😖

We still want to modify `p.usedTotal` if the slice's capacity is outside
of any bucket.
@GiedriusS GiedriusS requested a review from bwplotka July 22, 2019 12:59
@povilasv
Copy link
Member

I think this used to work because, previously return value was []byte slice, where you can't really grow the slice as you can't change the slice header. (https://github.com/thanos-io/thanos/pull/1227/files)

Since we did change to *[]byte, IMO we should just add docs, that users of pool shouldn't make byte slices larger. And if they do, the outcome might be unexpected.

IMO atomic is more preferred way than mutex in this case as it's more efficient. So I would keep the old code, just update docs and modify tests.

@GiedriusS
Copy link
Member Author

I agree that it would be more beneficial to us from performance's perspective to stay with the old code which uses atomic. So if it's okay with everyone then lets document that assumption very clearly and fix the p.usedTotal house-keeping so that we would not subtract that number from itself?

@povilasv
Copy link
Member

povilasv commented Jul 23, 2019

👍

Basically atomic.AddUint64(&p.usedTotal, ^uint64(p.usedTotal-1)) -> atomic.AddUint64(&p.usedTotal, ^uint64(sz-1)) and then docs + tests?

@GiedriusS
Copy link
Member Author

Yes, is that's OK with you?

@povilasv
Copy link
Member

yy SGTM 🥇

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sad we are locking underlying sync.Pool as it is already go routine safe, but definitely we want to have solid limit on used - many races where possible here indeed.

Thanks! LGTM!

@bwplotka bwplotka merged commit 58fd87f into thanos-io:master Jul 23, 2019
@GiedriusS
Copy link
Member Author

GiedriusS commented Jul 23, 2019

@bwplotka have you taken a look at the previous comments posted by @povilasv? 👀 Looking at this again, AFAICT it is impossible to properly maintain p.usedTotal with atomic since two goroutines can be in Put() or Get() and because there's no lock, inconsistencies can arise.

@povilasv
Copy link
Member

Why not? I can't see how that wouldn't work :D I mean a couple of gourotines would do Get and at a later point they would return it with Put, as we are adding number atomically, shouldn't it all converge?

@GiedriusS
Copy link
Member Author

Because we are putting that data into a separate variable and it can change from underneath us since there's no locking, right @povilasv?

@GiedriusS
Copy link
Member Author

GiedriusS commented Jul 24, 2019

It won't necessarily converge into 0 since, for example, one goroutine will get that there's X bytes used and then set it to X-Y, but another one will already have X too in the temporary variable but it will set it to X-Z. So the first update is lost unless we will do this sequentially. Does that make sense? However, we could sprinkle more checks all over the code so that we wouldn't use a temporary variable but I am not sure that it would really be faster than holding a mutex. Sorry for being terse, writing on my phone.

@povilasv
Copy link
Member

povilasv commented Jul 24, 2019

yeah it does. Then mutex makes sense.

@GiedriusS
Copy link
Member Author

GiedriusS commented Jul 24, 2019

Sorry, wanted to add one last thing that came to my mind: I think the problem here is that what we need to do is to atomatically compare one number with another, and then make a decision based on that - should we allocate or not? And with atomic it's impossible to do that - we still need to load it into a separate variable and write if .... However, a problem lies there: it's a classical set-and-get race. I think we should really stay with this PR as-is.

@povilasv
Copy link
Member

👍

paulfantom added a commit to paulfantom/thanos that referenced this pull request Jul 26, 2019
* master:
  Fix typos (thanos-io#1350)
  pool: add pool raciness tests, fix p.usedTotal house-keeping (thanos-io#1340)
  runutil: add Exhaust* fns, initial users (thanos-io#1302)
  Further docs rename to new org; Improved docker login. (thanos-io#1344)
  Moved CI to thanos-io org and docker image to quay.io/thanos/thanos (thanos-io#1342)
  thanos-io (thanos-io#1343)
  website: migrate to new Github Org (thanos-io#1341)
paulfantom added a commit to paulfantom/thanos that referenced this pull request Jul 26, 2019
* master:
  Fix typos (thanos-io#1350)
  pool: add pool raciness tests, fix p.usedTotal house-keeping (thanos-io#1340)
  runutil: add Exhaust* fns, initial users (thanos-io#1302)
  Further docs rename to new org; Improved docker login. (thanos-io#1344)
  Moved CI to thanos-io org and docker image to quay.io/thanos/thanos (thanos-io#1342)
  thanos-io (thanos-io#1343)
  website: migrate to new Github Org (thanos-io#1341)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants