-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix count of finished runs #1428
base: master
Are you sure you want to change the base?
Conversation
This may be slow without the appropriate indexes. But we’ll see. |
DEV updated. |
@dav1312 I'm commuting now, not able to test code. I wonder if the correct count of the documents it's simply |
As expected it is very slow. The indexes (created in server/utils/create_indexes.py) should have a partial filter expression |
Something like this? db["runs"].create_index(
[("finished", ASCENDING), ("last_updated", DESCENDING)],
name="finished_runs",
- partialFilterExpression={"finished": True},
+ partialFilterExpression={"finished": True, "deleted": False},
) |
Can we get the two stats above for dev again with master, my guess is that querying with such condition should always be faster than manipulating through the code |
I think (but I am not sure) that you also need to index on the |
I suspect that something green + ltc would need an index since it is the intersection of two large sets, hence not easy to compute by scanning. |
To me 300ms seems fine for a complex query (if I understand correctly your numbers). |
I'm taking the values from freemonitoring |
Bench of the first proposed code (no indexes), master and PR are on par (
#!/usr/bin/env python3
import time
from fishtest.rundb import RunDb
from pymongo import DESCENDING
def main():
rdb = RunDb()
limit = 25
samples = 100
q = { "$and": [ {"finished": True}, {"deleted": False} ] }
beg_ts = time.time()
for i in range(samples):
c = rdb.runs.find(q, skip=i*limit, limit=limit, sort=[("last_updated", DESCENDING)])
runs_list = list(c)
end_ts = time.time()
print("elapsed time: {}".format((end_ts - beg_ts) / samples))
q = {"finished": True}
beg_ts = time.time()
for i in range(samples):
c = rdb.runs.find(q, skip=i*limit, limit=limit, sort=[("last_updated", DESCENDING)])
runs_list = [run for run in c if not run.get("deleted")]
end_ts = time.time()
print("elapsed time: {}".format((end_ts - beg_ts) / samples))
if __name__ == "__main__":
main() |
DEV with new indexes on The bench of the new query with the indexes doesn't improve (elapsed time: 0.22787032127380372)
|
You mean the new indexes with master? |
6287082
to
60321a2
Compare
This code is very slow with the indexes (perhaps is normal after adding the indexes). q = {"finished": True}
beg_ts = time.time()
for i in range(samples):
c = rdb.runs.find(q, skip=i*limit, limit=limit, sort=[("last_updated", DESCENDING)])
runs_list = [run for run in c if not run.get("deleted")]
end_ts = time.time()
print("elapsed time: {}".format((end_ts - beg_ts) / samples)) |
Query the database instead of filter the runs later. This ensures that the count of finished and not deleted runs will be correct. This also fixes the pagination, some pages ended up having less than page_size runs.
60321a2
to
d560b39
Compare
DEV and indexes updated, here is the bench:
The pagination on DEV is very slow. |
So... do I add the new indexes or not? ☹ |
I think one should use |
Perhaps this is dumb, but why do we call |
Also, essentially by necessity, fix the num_finished_runs count which is inaccurate in master, see also official-stockfish#1428
Also, essentially by necessity, fix the num_finished_runs count which is inaccurate in master, see also official-stockfish#1428 This does require rebuilding the relevant index, which will also become larger with a more relaxed bound
Also, essentially by necessity, fix the num_finished_runs count which is inaccurate in master, see also official-stockfish#1428 This does require rebuilding the relevant index, which will also become larger with a more relaxed bound
Also, essentially by necessity, fix the num_finished_runs count which is inaccurate in master, see also official-stockfish#1428 This does require rebuilding the relevant index, which will also become larger with a more relaxed bound
Query the database instead of filter the runs later.
This ensures that the count of finished and not deleted runs will be correct.
This also fixes the pagination, some pages ended up having less than page_size (25) runs.
e.g.
This page has 0 runs: https://tests.stockfishchess.org/tests/user/VoyagerOne?page=244
And most of the pages in https://tests.stockfishchess.org/tests/finished don't have exactly 25 runs