Fix count of finished runs #1428

dav1312 · 2022-09-22T09:19:31Z

Query the database instead of filter the runs later.
This ensures that the count of finished and not deleted runs will be correct.
This also fixes the pagination, some pages ended up having less than page_size (25) runs.

e.g.
This page has 0 runs: https://tests.stockfishchess.org/tests/user/VoyagerOne?page=244
And most of the pages in https://tests.stockfishchess.org/tests/finished don't have exactly 25 runs

vdbergh · 2022-09-22T17:18:21Z

This may be slow without the appropriate indexes. But we’ll see.

server/fishtest/rundb.py

ppigazzini · 2022-09-23T15:42:30Z

DEV updated.
I found an old script able to bench the mongodb query.

dav1312 · 2022-09-23T15:51:06Z

In dev it does feel slow af to change pages, ifk if its the change or the server ☹

Edit: 😳

ppigazzini · 2022-09-23T16:34:20Z

@dav1312 I'm commuting now, not able to test code. I wonder if the correct count of the documents it's simply len(runs_list) with the master query and comprehension.
EDIT_000: no, the find() uses skip=skip, limit=limit too

vdbergh · 2022-09-23T18:10:32Z

As expected it is very slow. The indexes (created in server/utils/create_indexes.py) should have a partial filter expression deleted=False I think.

dav1312 · 2022-09-23T18:20:28Z

As expected it is very slow. The indexes (created in server/utils/create_indexes.py) should have a partial filter expression deleted=False I think.

Something like this?

     db["runs"].create_index(
         [("finished", ASCENDING), ("last_updated", DESCENDING)],
         name="finished_runs",
-        partialFilterExpression={"finished": True},
+        partialFilterExpression={"finished": True, "deleted": False},
     )

peregrineshahin · 2022-09-23T18:23:23Z

Can we get the two stats above for dev again with master, my guess is that querying with such condition should always be faster than manipulating through the code
Mongodb query should be faster than checking if statement in a loop with python (which is correct for all database types always)
Also the filtering operation or the query is running on the same resourse, so the resourse itself should not be an issue in principle

vdbergh · 2022-09-23T18:35:10Z

As expected it is very slow. The indexes (created in server/utils/create_indexes.py) should have a partial filter expression deleted=False I think.

Something like this?
     db["runs"].create_index(
         [("finished", ASCENDING), ("last_updated", DESCENDING)],
         name="finished_runs",
-        partialFilterExpression={"finished": True},
+        partialFilterExpression={"finished": True, "deleted": False},
     )

I think (but I am not sure) that you also need to index on the deleted field ( ("deleted": ASCENDING) ). You also have to do it for the indexes containing is_green and is_yellow.

dav1312 · 2022-09-23T19:13:14Z

@vdbergh #1425 will enable more complex searches to be performed easily.
Like STC, STC + Green, LTC + Yellow, etc.
Should indexes be created for all of those cases too?

vdbergh · 2022-09-23T20:03:38Z

I suspect that something green + ltc would need an index since it is the intersection of two large sets, hence not easy to compute by scanning.

dav1312 · 2022-09-23T20:05:41Z

Yeah, prod doesn't like it too much

But adding all of these combinations seems too tedious and definitely not scalable...

vdbergh · 2022-09-23T20:21:26Z

To me 300ms seems fine for a complex query (if I understand correctly your numbers).

dav1312 · 2022-09-23T20:26:57Z

To me 300ms seems fine for a complex query (if I understand correctly your numbers).

I'm taking the values from freemonitoring
Going to this page Prod LTC + Greens and pressing the Prev button to change pages a bunch of times

ppigazzini · 2022-09-23T21:06:12Z

Bench of the first proposed code (no indexes), master and PR are on par (limit = 25 as used in view.py)

elapsed time: 0.2326268172264099
elapsed time: 0.22734872102737427

#!/usr/bin/env python3
import time
from fishtest.rundb import RunDb
from pymongo import DESCENDING

def main():
    rdb = RunDb()
    limit = 25
    samples = 100

    q = { "$and": [ {"finished": True}, {"deleted": False} ] }
    beg_ts = time.time()
    for i in range(samples):
        c = rdb.runs.find(q, skip=i*limit, limit=limit, sort=[("last_updated", DESCENDING)])
        runs_list = list(c)
    end_ts = time.time()
    print("elapsed time: {}".format((end_ts - beg_ts) / samples))

    q = {"finished": True}
    beg_ts = time.time()
    for i in range(samples):
        c = rdb.runs.find(q, skip=i*limit, limit=limit, sort=[("last_updated", DESCENDING)])
        runs_list = [run for run in c if not run.get("deleted")]
    end_ts = time.time()
    print("elapsed time: {}".format((end_ts - beg_ts) / samples))

if __name__ == "__main__":
    main()

ppigazzini · 2022-09-23T21:43:37Z

DEV with new indexes on runs collection.

The bench of the new query with the indexes doesn't improve (elapsed time: 0.22787032127380372)
The bench of the master query with the indexes is 200x slower! (elapsed time: 53.909343242645264)

Current indexes on runs:
{ '_id_': {'key': [('_id', 1)], 'v': 2},
  'finished_green_runs': { 'key': [('finished', 1), ('deleted', 1), ('is_green', -1), ('last_updated', -1)],
                           'partialFilterExpression': SON([('finished', True), ('deleted', False), ('is_green', True)]),
                           'v': 2},
  'finished_ltc_runs': { 'key': [('finished', 1), ('deleted', 1), ('last_updated', -1), ('tc_base', -1)],
                         'partialFilterExpression': SON([('finished', True), ('deleted', False), ('tc_base', SON([('$gte', 40)]))]),
                         'v': 2},
  'finished_runs': { 'key': [('finished', 1), ('deleted', 1), ('last_updated', -1)],
                     'partialFilterExpression': SON([('finished', True), ('deleted', False)]),
                     'v': 2},
  'finished_yellow_runs': { 'key': [('finished', 1), ('deleted', 1), ('is_yellow', -1), ('last_updated', -1)],
                            'partialFilterExpression': SON([('finished', True), ('deleted', False), ('is_yellow', True)]),
                            'v': 2},
  'unfinished_runs': { 'key': [('finished', 1), ('last_updated', 1)],
                       'partialFilterExpression': SON([('finished', False)]),
                       'v': 2},
  'user_runs': {'key': [('args.username', -1), ('last_updated', -1)], 'v': 2}}

dav1312 · 2022-09-23T21:45:29Z

The bench of the master query with the indexes is 200x slower!

You mean the new indexes with master?

ppigazzini · 2022-09-23T22:01:42Z

The bench of the master query with the indexes is 200x slower!

You mean the new indexes with master?

This code is very slow with the indexes (perhaps is normal after adding the indexes).

    q = {"finished": True}
    beg_ts = time.time()
    for i in range(samples):
        c = rdb.runs.find(q, skip=i*limit, limit=limit, sort=[("last_updated", DESCENDING)])
        runs_list = [run for run in c if not run.get("deleted")]
    end_ts = time.time()
    print("elapsed time: {}".format((end_ts - beg_ts) / samples))

vdbergh · 2022-09-24T05:07:20Z

Here is an issue about the current indexes #1243 . It also contains information on how to debug indexes (which is tricky in mongodb). EDIT: also see #1242 .

Query the database instead of filter the runs later. This ensures that the count of finished and not deleted runs will be correct. This also fixes the pagination, some pages ended up having less than page_size runs.

ppigazzini · 2022-09-24T10:34:31Z

DEV and indexes updated, here is the bench:

elapsed time: 0.20528266429901124
elapsed time: 0.20790423154830934

The pagination on DEV is very slow.

dav1312 · 2022-09-24T10:41:37Z

So... do I add the new indexes or not? ☹
The thing is, before this, I don't recall dev taking so much time to load but now every time I open it it takes a very very long time

vdbergh · 2022-09-24T11:40:28Z

I think one should use explain to see if the new indexes are used. Unfortunately I have no time now. But see #1242.

dubslow · 2023-02-17T04:19:18Z

Perhaps this is dumb, but why do we call collection.count_documents(q) when we've already called (or will anyways call) list(collection.find(q)) ? Especially when the list is filtered further after the find(q) ?

Also, essentially by necessity, fix the num_finished_runs count which is inaccurate in master, see also official-stockfish#1428

Also, essentially by necessity, fix the num_finished_runs count which is inaccurate in master, see also official-stockfish#1428 This does require rebuilding the relevant index, which will also become larger with a more relaxed bound

ppigazzini added enhancement server server side changes labels Sep 23, 2022

ppigazzini reviewed Sep 23, 2022

View reviewed changes

server/fishtest/rundb.py Outdated Show resolved Hide resolved

dav1312 force-pushed the deleted-runs branch from 6287082 to 60321a2 Compare September 23, 2022 21:48

Fix count of finished runs

d560b39

Query the database instead of filter the runs later. This ensures that the count of finished and not deleted runs will be correct. This also fixes the pagination, some pages ended up having less than page_size runs.

dav1312 force-pushed the deleted-runs branch from 60321a2 to d560b39 Compare September 24, 2022 08:57

dubslow added a commit to dubslow/fishtest that referenced this pull request Feb 17, 2023

Fix LTC page excluding SMP LTC tests

33146ed

Also, essentially by necessity, fix the num_finished_runs count which is inaccurate in master, see also official-stockfish#1428

dubslow mentioned this pull request Feb 17, 2023

Fix LTC page excluding SMP LTC tests #1572

Open

peregrineshahin mentioned this pull request May 21, 2024

hidden tests on user profile #2033

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix count of finished runs #1428

Fix count of finished runs #1428

dav1312 commented Sep 22, 2022 •

edited

Loading

vdbergh commented Sep 22, 2022

ppigazzini commented Sep 23, 2022

dav1312 commented Sep 23, 2022 •

edited

Loading

ppigazzini commented Sep 23, 2022 •

edited

Loading

vdbergh commented Sep 23, 2022

dav1312 commented Sep 23, 2022

peregrineshahin commented Sep 23, 2022 •

edited

Loading

vdbergh commented Sep 23, 2022

dav1312 commented Sep 23, 2022 •

edited

Loading

vdbergh commented Sep 23, 2022

dav1312 commented Sep 23, 2022 •

edited

Loading

vdbergh commented Sep 23, 2022

dav1312 commented Sep 23, 2022

ppigazzini commented Sep 23, 2022 •

edited

Loading

ppigazzini commented Sep 23, 2022 •

edited

Loading

dav1312 commented Sep 23, 2022

ppigazzini commented Sep 23, 2022 •

edited

Loading

vdbergh commented Sep 24, 2022 •

edited

Loading

ppigazzini commented Sep 24, 2022 •

edited

Loading

dav1312 commented Sep 24, 2022 •

edited

Loading

vdbergh commented Sep 24, 2022

dubslow commented Feb 17, 2023

Fix count of finished runs #1428

Are you sure you want to change the base?

Fix count of finished runs #1428

Conversation

dav1312 commented Sep 22, 2022 • edited Loading

vdbergh commented Sep 22, 2022

ppigazzini commented Sep 23, 2022

dav1312 commented Sep 23, 2022 • edited Loading

ppigazzini commented Sep 23, 2022 • edited Loading

vdbergh commented Sep 23, 2022

dav1312 commented Sep 23, 2022

peregrineshahin commented Sep 23, 2022 • edited Loading

vdbergh commented Sep 23, 2022

dav1312 commented Sep 23, 2022 • edited Loading

vdbergh commented Sep 23, 2022

dav1312 commented Sep 23, 2022 • edited Loading

vdbergh commented Sep 23, 2022

dav1312 commented Sep 23, 2022

ppigazzini commented Sep 23, 2022 • edited Loading

ppigazzini commented Sep 23, 2022 • edited Loading

dav1312 commented Sep 23, 2022

ppigazzini commented Sep 23, 2022 • edited Loading

vdbergh commented Sep 24, 2022 • edited Loading

ppigazzini commented Sep 24, 2022 • edited Loading

dav1312 commented Sep 24, 2022 • edited Loading

vdbergh commented Sep 24, 2022

dubslow commented Feb 17, 2023

dav1312 commented Sep 22, 2022 •

edited

Loading

dav1312 commented Sep 23, 2022 •

edited

Loading

ppigazzini commented Sep 23, 2022 •

edited

Loading

peregrineshahin commented Sep 23, 2022 •

edited

Loading

dav1312 commented Sep 23, 2022 •

edited

Loading

dav1312 commented Sep 23, 2022 •

edited

Loading

ppigazzini commented Sep 23, 2022 •

edited

Loading

ppigazzini commented Sep 23, 2022 •

edited

Loading

ppigazzini commented Sep 23, 2022 •

edited

Loading

vdbergh commented Sep 24, 2022 •

edited

Loading

ppigazzini commented Sep 24, 2022 •

edited

Loading

dav1312 commented Sep 24, 2022 •

edited

Loading