-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pulls/issue page very slow with >19000 repos #28055
Comments
In older gitea versions I reported some performance issues with many repos/mirrors and orgas. ;) I try to analyse which query is the problem here. |
I see
and this
is the problem:
38280 queries on my instance for the pulls page. Wow. Two queries for every repo in my gitea. |
what is the use for |
I have no issues in my gitea instance. I have a lot of mirrors and for some projects I'm using pull requests. At the moment all pull requests are closed. The question is, why the change in 1.21.0? Why the new two selets for every repository? |
@KazzmanK the design here is wrong. When I have no issues but 1000000 project, then you check for every project; "is there an issue". How did you come up with the idea of doing it like that? |
We refactor the search engine here to support multiple engines, such as melisearch, es, etc. But when using the default db engine, it indeed somehow includes some duplicate search. I notice this abnormal behavior too. |
It's possible to fix this in 1.21.1? |
@delvh will this be not fixed? Or not fixed in 1.21.1? |
Any updates here? Will this be fixed? If yes -> in which release? If not -> why not? |
AFAIK, no one is working on it. |
Sounds not good. You changed the process. Cause with no issues and closed pull requests my Gitea instance needs 26-30 seconds. |
It could be fixed by // gitea/modules/indexer/issues/indexer.go
// CountIssuesByRepo counts issues by options and group by repo id.
// It's not a complete implementation, since it requires the caller should provide the repo ids.
// That means opts.RepoIDs must be specified, and opts.AllPublic must be false.
// It's good enough for the current usage, and it can be improved if needed.
// TODO: use "group by" of the indexer engines to implement it.
func CountIssuesByRepo(ctx context.Context, opts *SearchOptions) (map[int64]int64, error) {
+ if opts.Keyword == "" {
+ opt, _ := db.ToDBOptions(ctx, opts)
+ return issue_model.CountIssuesByRepo(ctx, opt)
+ } But it doesn't really help because if you search for issues with a non-empty keyword, it will still be very slow when you have 19,000 repos. And I believe #28304 can fix it by giving up the counts grouped by repo. You may be curious as to why there are no counts grouped by repo in your screenshot. Actually, there are counts grouped by 19000 repos (that's why it's slow), but most of them have been hidden because the counts are zero. And why was the old version fast? Because the search results could be incomplete, which is worse. |
I'm using my gitea instance only with one user. I have 19200 repos and ~3100 orgas. My question is, why the design is so? When I take a look on my case. No issues, but the indexer is checking all repos? And all my pull-requests are closed. Means, the page should be just "loaded". If I open the 310 closed pull-requests the details will be loaded. |
@somera I fully understand your question, but I cannot explain it in a few sentences. If you want to understand the whole story, you need to read
Or, you may simply hope that the problem can be resolved. Please wait patiently. |
@wolfogre thx. I will take a look. |
It will fix #28268 . <img width="1313" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/cb1e07d5-7a12-4691-a054-8278ba255bfc"> <img width="1318" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/4fd60820-97f1-4c2c-a233-d3671a5039e9"> ##⚠️ BREAKING⚠️ But need to give up some features: <img width="1312" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/281c0d51-0e7d-473f-bbed-216e2f645610"> However, such abandonment may fix #28055 . ## Backgroud When the user switches the dashboard context to an org, it means they want to search issues in the repos that belong to the org. However, when they switch to themselves, it means all repos they can access because they may have created an issue in a public repo that they don't own. <img width="286" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/182dcd5b-1c20-4725-93af-96e8dfae5b97"> It's a confusing design. Think about this: What does "In your repositories" mean when the user switches to an org? Repos belong to the user or the org? Whatever, it has been broken by #26012 and its following PRs. After the PR, it searches for issues in repos that the dashboard context user owns or has been explicitly granted access to, so it causes #28268. ## How to fix it It's not really difficult to fix it. Just extend the repo scope to search issues when the dashboard context user is the doer. Since the user may create issues or be mentioned in any public repo, we can just set `AllPublic` to true, which is already supported by indexers. The DB condition will also support it in this PR. But the real difficulty is how to count the search results grouped by repos. It's something like "search issues with this keyword and those filters, and return the total number and the top results. **Then, group all of them by repo and return the counts of each group.**" <img width="314" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/5206eb20-f8f5-49b9-b45a-1be2fcf679f4"> Before #26012, it was being done in the DB, but it caused the results to be incomplete (see the description of #26012). And to keep this, #26012 implement it in an inefficient way, just count the issues by repo one by one, so it cannot work when `AllPublic` is true because it's almost impossible to do this for all public repos. https://github.com/go-gitea/gitea/blob/1bfcdeef4cca0f5509476358e5931c13d37ed1ca/modules/indexer/issues/indexer.go#L318-L338 ## Give up unnecessary features We may can resovle `TODO: use "group by" of the indexer engines to implement it`, I'm sure it can be done with Elasticsearch, but IIRC, Bleve and Meilisearch don't support "group by". And the real question is, does it worth it? Why should we need to know the counts grouped by repos? Let me show you my search dashboard on gitea.com. <img width="1304" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/2bca2d46-6c71-4de1-94cb-0c9af27c62ff"> I never think the long repo list helps anything. And if we agree to abandon it, things will be much easier. That is this PR. ## TODO I know it's important to filter by repos when searching issues. However, it shouldn't be the way we have it now. It could be implemented like this. <img width="1316" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/99ee5f21-cbb5-4dfe-914d-cb796cb79fbe"> The indexers support it well now, but it requires some frontend work, which I'm not good at. So, I think someone could help do that in another PR and merge this one to fix the bug first. Or please block this PR and help to complete it. Finally, "Switch dashboard context" is also a design that needs improvement. In my opinion, it can be accomplished by adding filtering conditions instead of "switching".
…8304) It will fix go-gitea#28268 . <img width="1313" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/cb1e07d5-7a12-4691-a054-8278ba255bfc"> <img width="1318" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/4fd60820-97f1-4c2c-a233-d3671a5039e9"> But need to give up some features: <img width="1312" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/281c0d51-0e7d-473f-bbed-216e2f645610"> However, such abandonment may fix go-gitea#28055 . When the user switches the dashboard context to an org, it means they want to search issues in the repos that belong to the org. However, when they switch to themselves, it means all repos they can access because they may have created an issue in a public repo that they don't own. <img width="286" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/182dcd5b-1c20-4725-93af-96e8dfae5b97"> It's a confusing design. Think about this: What does "In your repositories" mean when the user switches to an org? Repos belong to the user or the org? Whatever, it has been broken by go-gitea#26012 and its following PRs. After the PR, it searches for issues in repos that the dashboard context user owns or has been explicitly granted access to, so it causes go-gitea#28268. It's not really difficult to fix it. Just extend the repo scope to search issues when the dashboard context user is the doer. Since the user may create issues or be mentioned in any public repo, we can just set `AllPublic` to true, which is already supported by indexers. The DB condition will also support it in this PR. But the real difficulty is how to count the search results grouped by repos. It's something like "search issues with this keyword and those filters, and return the total number and the top results. **Then, group all of them by repo and return the counts of each group.**" <img width="314" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/5206eb20-f8f5-49b9-b45a-1be2fcf679f4"> Before go-gitea#26012, it was being done in the DB, but it caused the results to be incomplete (see the description of go-gitea#26012). And to keep this, go-gitea#26012 implement it in an inefficient way, just count the issues by repo one by one, so it cannot work when `AllPublic` is true because it's almost impossible to do this for all public repos. https://github.com/go-gitea/gitea/blob/1bfcdeef4cca0f5509476358e5931c13d37ed1ca/modules/indexer/issues/indexer.go#L318-L338 We may can resovle `TODO: use "group by" of the indexer engines to implement it`, I'm sure it can be done with Elasticsearch, but IIRC, Bleve and Meilisearch don't support "group by". And the real question is, does it worth it? Why should we need to know the counts grouped by repos? Let me show you my search dashboard on gitea.com. <img width="1304" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/2bca2d46-6c71-4de1-94cb-0c9af27c62ff"> I never think the long repo list helps anything. And if we agree to abandon it, things will be much easier. That is this PR. I know it's important to filter by repos when searching issues. However, it shouldn't be the way we have it now. It could be implemented like this. <img width="1316" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/99ee5f21-cbb5-4dfe-914d-cb796cb79fbe"> The indexers support it well now, but it requires some frontend work, which I'm not good at. So, I think someone could help do that in another PR and merge this one to fix the bug first. Or please block this PR and help to complete it. Finally, "Switch dashboard context" is also a design that needs improvement. In my opinion, it can be accomplished by adding filtering conditions instead of "switching".
I can't reopen it. This is how it looks like in 1.21.2 -> with the fix. Nothing changes. I don't understand why so much queries to show TWO open pull requests and just show the number with 338 closed pull requests. The logged user hat ~3000 orgas and ~19000 repos. |
@somera It hasn't been backported to v1.21.2 because it's breaking, see #28385 (comment) Could you please try the latest nightly version? Or you can wait for v1.22. |
Thx for the info. I can wait for 1.22.x |
…8304) It will fix go-gitea#28268 . <img width="1313" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/cb1e07d5-7a12-4691-a054-8278ba255bfc"> <img width="1318" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/4fd60820-97f1-4c2c-a233-d3671a5039e9"> ##⚠️ BREAKING⚠️ But need to give up some features: <img width="1312" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/281c0d51-0e7d-473f-bbed-216e2f645610"> However, such abandonment may fix go-gitea#28055 . ## Backgroud When the user switches the dashboard context to an org, it means they want to search issues in the repos that belong to the org. However, when they switch to themselves, it means all repos they can access because they may have created an issue in a public repo that they don't own. <img width="286" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/182dcd5b-1c20-4725-93af-96e8dfae5b97"> It's a confusing design. Think about this: What does "In your repositories" mean when the user switches to an org? Repos belong to the user or the org? Whatever, it has been broken by go-gitea#26012 and its following PRs. After the PR, it searches for issues in repos that the dashboard context user owns or has been explicitly granted access to, so it causes go-gitea#28268. ## How to fix it It's not really difficult to fix it. Just extend the repo scope to search issues when the dashboard context user is the doer. Since the user may create issues or be mentioned in any public repo, we can just set `AllPublic` to true, which is already supported by indexers. The DB condition will also support it in this PR. But the real difficulty is how to count the search results grouped by repos. It's something like "search issues with this keyword and those filters, and return the total number and the top results. **Then, group all of them by repo and return the counts of each group.**" <img width="314" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/5206eb20-f8f5-49b9-b45a-1be2fcf679f4"> Before go-gitea#26012, it was being done in the DB, but it caused the results to be incomplete (see the description of go-gitea#26012). And to keep this, go-gitea#26012 implement it in an inefficient way, just count the issues by repo one by one, so it cannot work when `AllPublic` is true because it's almost impossible to do this for all public repos. https://github.com/go-gitea/gitea/blob/1bfcdeef4cca0f5509476358e5931c13d37ed1ca/modules/indexer/issues/indexer.go#L318-L338 ## Give up unnecessary features We may can resovle `TODO: use "group by" of the indexer engines to implement it`, I'm sure it can be done with Elasticsearch, but IIRC, Bleve and Meilisearch don't support "group by". And the real question is, does it worth it? Why should we need to know the counts grouped by repos? Let me show you my search dashboard on gitea.com. <img width="1304" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/2bca2d46-6c71-4de1-94cb-0c9af27c62ff"> I never think the long repo list helps anything. And if we agree to abandon it, things will be much easier. That is this PR. ## TODO I know it's important to filter by repos when searching issues. However, it shouldn't be the way we have it now. It could be implemented like this. <img width="1316" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/99ee5f21-cbb5-4dfe-914d-cb796cb79fbe"> The indexers support it well now, but it requires some frontend work, which I'm not good at. So, I think someone could help do that in another PR and merge this one to fix the bug first. Or please block this PR and help to complete it. Finally, "Switch dashboard context" is also a design that needs improvement. In my opinion, it can be accomplished by adding filtering conditions instead of "switching".
…8304) It will fix go-gitea#28268 . <img width="1313" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/cb1e07d5-7a12-4691-a054-8278ba255bfc"> <img width="1318" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/4fd60820-97f1-4c2c-a233-d3671a5039e9"> ##⚠️ BREAKING⚠️ But need to give up some features: <img width="1312" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/281c0d51-0e7d-473f-bbed-216e2f645610"> However, such abandonment may fix go-gitea#28055 . ## Backgroud When the user switches the dashboard context to an org, it means they want to search issues in the repos that belong to the org. However, when they switch to themselves, it means all repos they can access because they may have created an issue in a public repo that they don't own. <img width="286" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/182dcd5b-1c20-4725-93af-96e8dfae5b97"> It's a confusing design. Think about this: What does "In your repositories" mean when the user switches to an org? Repos belong to the user or the org? Whatever, it has been broken by go-gitea#26012 and its following PRs. After the PR, it searches for issues in repos that the dashboard context user owns or has been explicitly granted access to, so it causes go-gitea#28268. ## How to fix it It's not really difficult to fix it. Just extend the repo scope to search issues when the dashboard context user is the doer. Since the user may create issues or be mentioned in any public repo, we can just set `AllPublic` to true, which is already supported by indexers. The DB condition will also support it in this PR. But the real difficulty is how to count the search results grouped by repos. It's something like "search issues with this keyword and those filters, and return the total number and the top results. **Then, group all of them by repo and return the counts of each group.**" <img width="314" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/5206eb20-f8f5-49b9-b45a-1be2fcf679f4"> Before go-gitea#26012, it was being done in the DB, but it caused the results to be incomplete (see the description of go-gitea#26012). And to keep this, go-gitea#26012 implement it in an inefficient way, just count the issues by repo one by one, so it cannot work when `AllPublic` is true because it's almost impossible to do this for all public repos. https://github.com/go-gitea/gitea/blob/1bfcdeef4cca0f5509476358e5931c13d37ed1ca/modules/indexer/issues/indexer.go#L318-L338 ## Give up unnecessary features We may can resovle `TODO: use "group by" of the indexer engines to implement it`, I'm sure it can be done with Elasticsearch, but IIRC, Bleve and Meilisearch don't support "group by". And the real question is, does it worth it? Why should we need to know the counts grouped by repos? Let me show you my search dashboard on gitea.com. <img width="1304" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/2bca2d46-6c71-4de1-94cb-0c9af27c62ff"> I never think the long repo list helps anything. And if we agree to abandon it, things will be much easier. That is this PR. ## TODO I know it's important to filter by repos when searching issues. However, it shouldn't be the way we have it now. It could be implemented like this. <img width="1316" alt="image" src="https://github.com/go-gitea/gitea/assets/9418365/99ee5f21-cbb5-4dfe-914d-cb796cb79fbe"> The indexers support it well now, but it requires some frontend work, which I'm not good at. So, I think someone could help do that in another PR and merge this one to fix the bug first. Or please block this PR and help to complete it. Finally, "Switch dashboard context" is also a design that needs improvement. In my opinion, it can be accomplished by adding filtering conditions instead of "switching".
Description
On my gitea instance with 19000 repositories and 3200 ogas the pulls page need
in 1.20.5 it was fast.
Possible SQL query changes.
Gitea Version
1.21.0
Can you reproduce the bug on the Gitea demo site?
No
Log Gist
No response
Screenshots
No response
Git Version
2.42.0
Operating System
Ubuntu 20.04.x
How are you running Gitea?
gitea-1.21.0-linux-amd64
Database
PostgreSQL
The text was updated successfully, but these errors were encountered: