Potential data race #1092

typeless · 2018-12-19T04:05:12Z

#1091 is a sample code that could produce a data race.
I had several attempts to fix it but cannot reach a satisfying result.
Please have a look.

mschoch · 2019-02-01T15:21:08Z

Thanks for reporting this, I have taken a look.

First, although the test you provided demonstrated the problem in upsidedown, I suspected it may affect both upsidedown and scorch (or at least we should test both to see). I adapted your test to run one level higher, at the top-level bleve package.
The results were a bit surprising, because we already encountered a similar bug in the past (Data race in Batch.Reset() #260). And further, we added a test for that then, and that test is passing with the race detector (for both upsidedown and scorch).
So what is different about your test? Notably, you're indexing empty batches. In fact, if you add a line to your test to index a document each time, you'll notice that the race detector passes. So, there was something special about the case of indexing empty batches, and reusing the batch that was triggering the race detector.

What I've determined is that when there documents in the batch, we internally synchronize things by going over a channel, thus the race detector can prove that the access is safe. However, when there are no documents in the batch, the loop iteration happens 0 times, and we never go through that channel synchronization, and thus the race detector correctly points out this bug.

It turns out the fix is straightforward, we can eliminate the unsynchronized access by the other goroutine in the case when there are no documents, thus removing the race.

I'll be submitting a PR to address this issue shortly.

mschoch · 2019-02-01T20:17:07Z

Fixed by #1121

this is another variation of the race found/fixed in blevesearch#1092 in that case the batch was empty, which meant we would skip the code that properly synchronized access. our fix only handled this exact case (no data operations), however there is another variation, if the batch contains only deletes (which are data ops) we still spawned the goroutine, although since there were no real updates, the again the synchronization code would be skipped, and thus the data race could happen. the fix is to check the number of updates (computed earlier on the caller's goroutine, so it's safe) instead of the length of the IndexOps (which includes updates and deletes) the key is that we should only spawn the goroutine that will range over the batch, in cases where we will synchronize on waiting for the analysis to complete (at least one update).

this is another variation of the race found/fixed in #1092 in that case the batch was empty, which meant we would skip the code that properly synchronized access. our fix only handled this exact case (no data operations), however there is another variation, if the batch contains only deletes (which are data ops) we still spawned the goroutine, although since there were no real updates, again the synchronization code would be skipped, and thus the data race could happen. the fix is to check the number of updates (computed earlier on the caller's goroutine, so it's safe) instead of the length of the IndexOps (which includes updates and deletes) the key is that we should only spawn the goroutine that will range over the batch, in cases where we will synchronize on waiting for the analysis to complete (at least one update). fixes #1149

mschoch added the bug label Feb 1, 2019

mschoch closed this as completed Feb 1, 2019

mschoch mentioned this issue Mar 3, 2019

fix data race with batch reset #1150

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential data race #1092

Potential data race #1092

typeless commented Dec 19, 2018

mschoch commented Feb 1, 2019 •

edited

Loading

mschoch commented Feb 1, 2019

Potential data race #1092

Potential data race #1092

Comments

typeless commented Dec 19, 2018

mschoch commented Feb 1, 2019 • edited Loading

mschoch commented Feb 1, 2019

mschoch commented Feb 1, 2019 •

edited

Loading