You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
During primary-primary relocation, encountering data loss when indexing is happening at high TPS. This specifically is starting after initiateTracking happens for the new primary shard. A subset of docs are missing after relocation completes. Also noticing that after relocation handoff is completed, indexing landing on new primary shard uses correct seq no. However, the overall count of docs is not correct.
To Reproduce
Step 1 - Create SegRep index with index.translog.durability as async or request. The issue shows easily on async option.
for i in {1..1000}
do
curl --location --request POST "localhost:9202/test-index/_doc" \
--header 'Content-Type: application/json' \
--data-raw "{
\"name\":\"abc${i}\"
}"
echo "$i\n"
done
Had found an issue for remote-backed indexes relating to relocation - #6214. The same issue exists for segrep indexes as well (validated the same). The fix for remote-backed indexes is present in #6314. I have validated the fix for segrep as well and it seems to work. Pls feel free to start from the same fix and we can reason out any alternate approaches as well. cc @dreamer-89@mch2
Thanks for raising this @ashking94. We've been discussing this on #6065 - as this is a cause for some of the flakiness with our relocation ITs. An addition I mentioned on 6065 that I like with SR, is to execute a refresh before we do the round of SR for the new primary during relocation. However, I like this change not only for relocation but also during failover scenarios to guarantee we are not leaving ops in the xlog. Will include a test for that while concurrently indexing.
Describe the bug
During primary-primary relocation, encountering data loss when indexing is happening at high TPS. This specifically is starting after initiateTracking happens for the new primary shard. A subset of docs are missing after relocation completes. Also noticing that after relocation handoff is completed, indexing landing on new primary shard uses correct seq no. However, the overall count of docs is not correct.
To Reproduce
Step 1 - Create SegRep index with
index.translog.durability
asasync
orrequest
. The issue shows easily onasync
option.Step 2 - Index docs
Step 3 - Start relocation of index
The text was updated successfully, but these errors were encountered: