-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache statistics and provide estimation methods #19474
Conversation
Currently whenever the prometheus metrics endpoint or `/admin` endpoint are viewed the statistics are recalculated immediately - using COUNT rather than a less expensive method. This PR provides a mechanism to cache these statistics, avoids generating all of the metrics on the admin page and provides an estimation method for the plain table counts. Fix go-gitea#17506 Signed-off-by: Andrew Thornton <[email protected]>
Maybe it's database's responsibility to optimize the |
All (at least most) transactional database's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except why the accurate number is necessary or makes sense.
ensure that statistics are only calculated one at a time. Signed-off-by: Andrew Thornton <[email protected]>
Signed-off-by: Andrew Thornton <[email protected]>
Signed-off-by: Andrew Thornton <[email protected]>
Since there is a new related PR #19561 , I still think the mechanism should be simplified to providing estimated numbers without cache, with no setting option. Complexity is the enemy if it's not a must. If the accurate numbers don't help admins really, then it's not necessary to make it complex. It's not ideal to introduce more and more (not-that-useful) options, now there were 4 new options, if there are other similar requirements in future then there will be 6 or 8 options which doesn't help users really. And simplified code would help other PRs like #19561 and maybe more. |
I must absolutely disagree about the cache. We should be caching these statistics. It's insanely wasteful to be repeatedly recalculating these results everytime you go to the admin dashboard. I also disagree about whether we should be simply providing estimates or not. If drop to Estimates by default we will IMMEDIATELY receive complaints from people because the numbers are "incorrect". It's simply not worth the hassle. |
In terms of providing estimates the non-simple counts, e.g. the UserCount in postgres we can use: EXPLAIN SELECT COUNT(*) FROM `user` WHERE `type`=0; and postgres will return something like:
Extracting the MySQL also has EXPLAIN and will return the numbers of rows in a rows column but I'm not sure whether it will provide a quicker estimate. For MSSQL we likely need to add a NONCLUSTERED COLUMNSTORE INDEX on to the appropriate columns. |
I disagree that we should make code more than complex just because of some non-sense complaints. There are 2 kinds of complaints, one is that the user is really hurt, one is that the user doesn't know what they want or what they are doing. For this case, I believe it's kind two (users do not know what they want). Do users really know what the accurate And if you use cache, when the underlying number keeps changing (like comments/actions), do users really get accurate numbers? They are already contradictions, because in they end they still read the estimated numbers instead of the accurate ones. Could there be some compromise plans that for small tables like |
Signed-off-by: Andrew Thornton <[email protected]>
OK I've dropped the ESTIMATE_COUNTS option. You can field and answer the issues when they complain - because they will. |
Could we make a poll about how to continue?
Indeed, my personal opinion might be wrong (and it's not a blocker). If nobody else likes any of the idea, such improvement can not get merged. |
Can't we just cache values and reset cache on specific events (user created/deleted etc)? |
Signed-off-by: Andrew Thornton <[email protected]>
Signed-off-by: Andrew Thornton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine apart from the entries missing in custom/conf/app.example.ini
.
Also, what's the reason to estimate only some of the statistics, but not all of them?
After long time, nobody complains that "the action is not accurate". I have 100% confidence that my conclusion is right: no need to do more ticks for these large tables. In the following PR, I will do:
So this PR could be closed. |
Currently whenever the prometheus metrics endpoint or
/admin
endpoint are viewedthe statistics are recalculated immediately - using COUNT rather than a less expensive
method.
This PR provides a mechanism to cache these statistics, avoids generating all of the
metrics on the admin page and provides an estimation method for the plain table counts.
Fix #17506
Signed-off-by: Andrew Thornton [email protected]