Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix count of finished runs #1428

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dav1312
Copy link
Contributor

@dav1312 dav1312 commented Sep 22, 2022

Query the database instead of filter the runs later.
This ensures that the count of finished and not deleted runs will be correct.
This also fixes the pagination, some pages ended up having less than page_size (25) runs.

e.g.
This page has 0 runs: https://tests.stockfishchess.org/tests/user/VoyagerOne?page=244
And most of the pages in https://tests.stockfishchess.org/tests/finished don't have exactly 25 runs

@vdbergh
Copy link
Contributor

vdbergh commented Sep 22, 2022

This may be slow without the appropriate indexes. But we’ll see.

@ppigazzini ppigazzini added enhancement server server side changes labels Sep 23, 2022
@ppigazzini
Copy link
Collaborator

DEV updated.
I found an old script able to bench the mongodb query.

@dav1312
Copy link
Contributor Author

dav1312 commented Sep 23, 2022

In dev it does feel slow af to change pages, ifk if its the change or the server ☹

Edit: 😳
image

@ppigazzini
Copy link
Collaborator

ppigazzini commented Sep 23, 2022

@dav1312 I'm commuting now, not able to test code. I wonder if the correct count of the documents it's simply len(runs_list) with the master query and comprehension.
EDIT_000: no, the find() uses skip=skip, limit=limit too

@vdbergh
Copy link
Contributor

vdbergh commented Sep 23, 2022

As expected it is very slow. The indexes (created in server/utils/create_indexes.py) should have a partial filter expression deleted=False I think.

@dav1312
Copy link
Contributor Author

dav1312 commented Sep 23, 2022

As expected it is very slow. The indexes (created in server/utils/create_indexes.py) should have a partial filter expression deleted=False I think.

Something like this?

     db["runs"].create_index(
         [("finished", ASCENDING), ("last_updated", DESCENDING)],
         name="finished_runs",
-        partialFilterExpression={"finished": True},
+        partialFilterExpression={"finished": True, "deleted": False},
     )

@peregrineshahin
Copy link
Contributor

peregrineshahin commented Sep 23, 2022

Can we get the two stats above for dev again with master, my guess is that querying with such condition should always be faster than manipulating through the code
Mongodb query should be faster than checking if statement in a loop with python (which is correct for all database types always)
Also the filtering operation or the query is running on the same resourse, so the resourse itself should not be an issue in principle

@vdbergh
Copy link
Contributor

vdbergh commented Sep 23, 2022

As expected it is very slow. The indexes (created in server/utils/create_indexes.py) should have a partial filter expression deleted=False I think.

Something like this?

     db["runs"].create_index(
         [("finished", ASCENDING), ("last_updated", DESCENDING)],
         name="finished_runs",
-        partialFilterExpression={"finished": True},
+        partialFilterExpression={"finished": True, "deleted": False},
     )

I think (but I am not sure) that you also need to index on the deleted field ( ("deleted": ASCENDING) ). You also have to do it for the indexes containing is_green and is_yellow.

@dav1312
Copy link
Contributor Author

dav1312 commented Sep 23, 2022

@vdbergh #1425 will enable more complex searches to be performed easily.
Like STC, STC + Green, LTC + Yellow, etc.
Should indexes be created for all of those cases too?

@vdbergh
Copy link
Contributor

vdbergh commented Sep 23, 2022

I suspect that something green + ltc would need an index since it is the intersection of two large sets, hence not easy to compute by scanning.

@dav1312
Copy link
Contributor Author

dav1312 commented Sep 23, 2022

Yeah, prod doesn't like it too much

image

But adding all of these combinations seems too tedious and definitely not scalable...

@vdbergh
Copy link
Contributor

vdbergh commented Sep 23, 2022

To me 300ms seems fine for a complex query (if I understand correctly your numbers).

@dav1312
Copy link
Contributor Author

dav1312 commented Sep 23, 2022

To me 300ms seems fine for a complex query (if I understand correctly your numbers).

I'm taking the values from freemonitoring
Going to this page Prod LTC + Greens and pressing the Prev button to change pages a bunch of times

@ppigazzini
Copy link
Collaborator

ppigazzini commented Sep 23, 2022

Bench of the first proposed code (no indexes), master and PR are on par (limit = 25 as used in view.py)

elapsed time: 0.2326268172264099
elapsed time: 0.22734872102737427
#!/usr/bin/env python3
import time
from fishtest.rundb import RunDb
from pymongo import DESCENDING

def main():
    rdb = RunDb()
    limit = 25
    samples = 100

    q = { "$and": [ {"finished": True}, {"deleted": False} ] }
    beg_ts = time.time()
    for i in range(samples):
        c = rdb.runs.find(q, skip=i*limit, limit=limit, sort=[("last_updated", DESCENDING)])
        runs_list = list(c)
    end_ts = time.time()
    print("elapsed time: {}".format((end_ts - beg_ts) / samples))

    q = {"finished": True}
    beg_ts = time.time()
    for i in range(samples):
        c = rdb.runs.find(q, skip=i*limit, limit=limit, sort=[("last_updated", DESCENDING)])
        runs_list = [run for run in c if not run.get("deleted")]
    end_ts = time.time()
    print("elapsed time: {}".format((end_ts - beg_ts) / samples))

if __name__ == "__main__":
    main()

@ppigazzini
Copy link
Collaborator

ppigazzini commented Sep 23, 2022

DEV with new indexes on runs collection.

The bench of the new query with the indexes doesn't improve (elapsed time: 0.22787032127380372)
The bench of the master query with the indexes is 200x slower! (elapsed time: 53.909343242645264)

Current indexes on runs:
{ '_id_': {'key': [('_id', 1)], 'v': 2},
  'finished_green_runs': { 'key': [('finished', 1), ('deleted', 1), ('is_green', -1), ('last_updated', -1)],
                           'partialFilterExpression': SON([('finished', True), ('deleted', False), ('is_green', True)]),
                           'v': 2},
  'finished_ltc_runs': { 'key': [('finished', 1), ('deleted', 1), ('last_updated', -1), ('tc_base', -1)],
                         'partialFilterExpression': SON([('finished', True), ('deleted', False), ('tc_base', SON([('$gte', 40)]))]),
                         'v': 2},
  'finished_runs': { 'key': [('finished', 1), ('deleted', 1), ('last_updated', -1)],
                     'partialFilterExpression': SON([('finished', True), ('deleted', False)]),
                     'v': 2},
  'finished_yellow_runs': { 'key': [('finished', 1), ('deleted', 1), ('is_yellow', -1), ('last_updated', -1)],
                            'partialFilterExpression': SON([('finished', True), ('deleted', False), ('is_yellow', True)]),
                            'v': 2},
  'unfinished_runs': { 'key': [('finished', 1), ('last_updated', 1)],
                       'partialFilterExpression': SON([('finished', False)]),
                       'v': 2},
  'user_runs': {'key': [('args.username', -1), ('last_updated', -1)], 'v': 2}}

@dav1312
Copy link
Contributor Author

dav1312 commented Sep 23, 2022

The bench of the master query with the indexes is 200x slower!

You mean the new indexes with master?

@ppigazzini
Copy link
Collaborator

ppigazzini commented Sep 23, 2022

The bench of the master query with the indexes is 200x slower!

You mean the new indexes with master?

This code is very slow with the indexes (perhaps is normal after adding the indexes).

    q = {"finished": True}
    beg_ts = time.time()
    for i in range(samples):
        c = rdb.runs.find(q, skip=i*limit, limit=limit, sort=[("last_updated", DESCENDING)])
        runs_list = [run for run in c if not run.get("deleted")]
    end_ts = time.time()
    print("elapsed time: {}".format((end_ts - beg_ts) / samples))

@vdbergh
Copy link
Contributor

vdbergh commented Sep 24, 2022

Here is an issue about the current indexes #1243 . It also contains information on how to debug indexes (which is tricky in mongodb). EDIT: also see #1242 .

Query the database instead of filter the runs later.
This ensures that the count of finished and not deleted runs will be correct.
This also fixes the pagination, some pages ended up having less than page_size runs.
@ppigazzini
Copy link
Collaborator

ppigazzini commented Sep 24, 2022

DEV and indexes updated, here is the bench:

elapsed time: 0.20528266429901124
elapsed time: 0.20790423154830934

The pagination on DEV is very slow.

@dav1312
Copy link
Contributor Author

dav1312 commented Sep 24, 2022

So... do I add the new indexes or not? ☹
The thing is, before this, I don't recall dev taking so much time to load but now every time I open it it takes a very very long time

@vdbergh
Copy link
Contributor

vdbergh commented Sep 24, 2022

I think one should use explain to see if the new indexes are used. Unfortunately I have no time now. But see #1242.

@dubslow
Copy link
Contributor

dubslow commented Feb 17, 2023

Perhaps this is dumb, but why do we call collection.count_documents(q) when we've already called (or will anyways call) list(collection.find(q)) ? Especially when the list is filtered further after the find(q) ?

dubslow added a commit to dubslow/fishtest that referenced this pull request Feb 17, 2023
Also, essentially by necessity, fix the num_finished_runs count which is inaccurate in master, see also official-stockfish#1428
dubslow added a commit to dubslow/fishtest that referenced this pull request Feb 17, 2023
Also, essentially by necessity, fix the num_finished_runs count which is inaccurate in master, see also official-stockfish#1428

This does require rebuilding the relevant index, which will also become larger with a more relaxed bound
dubslow added a commit to dubslow/fishtest that referenced this pull request Feb 17, 2023
Also, essentially by necessity, fix the num_finished_runs count which is inaccurate in master, see also official-stockfish#1428

This does require rebuilding the relevant index, which will also become larger with a more relaxed bound
dubslow added a commit to dubslow/fishtest that referenced this pull request Feb 17, 2023
Also, essentially by necessity, fix the num_finished_runs count which is inaccurate in master, see also official-stockfish#1428

This does require rebuilding the relevant index, which will also become larger with a more relaxed bound
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement server server side changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants