Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pulls/issue page very slow with >19000 repos #28055

Closed
somera opened this issue Nov 14, 2023 · 21 comments · Fixed by #28304
Closed

pulls/issue page very slow with >19000 repos #28055

somera opened this issue Nov 14, 2023 · 21 comments · Fixed by #28304
Labels
performance/speed performance issues with slow downs type/bug

Comments

@somera
Copy link

somera commented Nov 14, 2023

Description

On my gitea instance with 19000 repositories and 3200 ogas the pulls page need

image

in 1.20.5 it was fast.

Possible SQL query changes.

Gitea Version

1.21.0

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

2.42.0

Operating System

Ubuntu 20.04.x

How are you running Gitea?

gitea-1.21.0-linux-amd64

Database

PostgreSQL

@somera
Copy link
Author

somera commented Nov 14, 2023

In older gitea versions I reported some performance issues with many repos/mirrors and orgas. ;)

I try to analyse which query is the problem here.

@somera
Copy link
Author

somera commented Nov 14, 2023

I see

==> gitea.log <==
2023/11/14 17:45:15 ...eb/routing/logger.go:68:func1() [W] router: slow      GET /pulls for xx.xx.xx.xx:0, elapsed 3047.5ms @ user/home.go:329(user.Pulls)

and this

gitea.log:2023/11/14 17:51:40 ...sues/issue_search.go:499:IssueIDs() [I] [SQL] SELECT count(*) FROM "issue" INNER JOIN "repository" ON "issue".repo_id = "repository".id WHERE issue.repo_id=$1 AND (issue.is_closed=$2) AND (issue.is_pull=$3) [7954 false true] - 958.121µs
gitea.log:2023/11/14 17:51:40 ...sues/issue_search.go:499:IssueIDs() [I] [SQL] SELECT "issue".id FROM "issue" INNER JOIN "repository" ON "issue".repo_id = "repository".id WHERE issue.repo_id=$1 AND (issue.is_closed=$2) AND (issue.is_pull=$3) ORDER BY "issue"."updated_unix" DESC, "issue"."created_unix" DESC, "issue"."id" DESC [19500 false true] - 1.283654ms

is the problem:

$ grep "sues/issue_search.go:499:IssueIDs()" gitea.log | wc -l
38280

38280 queries on my instance for the pulls page. Wow.

Two queries for every repo in my gitea.

@somera somera changed the title pulls very slow pulls very slow with > 19000 repos Nov 14, 2023
@somera somera changed the title pulls very slow with > 19000 repos pulls page very slow with >19000 repos Nov 14, 2023
@somera somera changed the title pulls page very slow with >19000 repos pulls/issue page very slow with >19000 repos Nov 14, 2023
@lunny lunny added the performance/speed performance issues with slow downs label Nov 15, 2023
@lunny lunny added this to the 1.21.1 milestone Nov 15, 2023
@KazzmanK
Copy link
Contributor

what is the use for join here? Just to be sure what issue is still referenced by existing repo?
looks like repository join can easily be dropped here.

@somera
Copy link
Author

somera commented Nov 15, 2023

I have no issues in my gitea instance. I have a lot of mirrors and for some projects I'm using pull requests. At the moment all pull requests are closed.

The question is, why the change in 1.21.0? Why the new two selets for every repository?

@somera
Copy link
Author

somera commented Nov 15, 2023

@KazzmanK the design here is wrong.

When I have no issues but 1000000 project, then you check for every project; "is there an issue".

How did you come up with the idea of doing it like that?

@somera
Copy link
Author

somera commented Nov 15, 2023

No issues but ~39000 SQL queries.

issues:
image

pulls:
image

In 1.20.x it was fast. Why this change?

Or how much queries will be made if there are 2000 issues/pulls?

@lng2020
Copy link
Member

lng2020 commented Nov 15, 2023

Or how much queries will be made if there are 2000 issues/pulls?

We refactor the search engine here to support multiple engines, such as melisearch, es, etc. But when using the default db engine, it indeed somehow includes some duplicate search. I notice this abnormal behavior too.
The refactor PR: #25174

@somera
Copy link
Author

somera commented Nov 16, 2023

It's possible to fix this in 1.21.1?

@delvh delvh removed this from the 1.21.1 milestone Nov 25, 2023
@somera
Copy link
Author

somera commented Nov 25, 2023

@delvh will this be not fixed? Or not fixed in 1.21.1?

@somera
Copy link
Author

somera commented Nov 27, 2023

I sav whis was removed from 1.21.1

This is the time for issues page (with no existing issues) with 1.21.1

image

What is the status here?

@somera
Copy link
Author

somera commented Nov 28, 2023

Any updates here? Will this be fixed?

If yes -> in which release?

If not -> why not?

@lng2020
Copy link
Member

lng2020 commented Nov 29, 2023

If not -> why not?

AFAIK, no one is working on it.

@somera
Copy link
Author

somera commented Nov 29, 2023

Sounds not good. You changed the process. Cause with no issues and closed pull requests my Gitea instance needs 26-30 seconds.

@wolfogre
Copy link
Member

wolfogre commented Dec 1, 2023

It could be fixed by

// gitea/modules/indexer/issues/indexer.go
// CountIssuesByRepo counts issues by options and group by repo id.
// It's not a complete implementation, since it requires the caller should provide the repo ids.
// That means opts.RepoIDs must be specified, and opts.AllPublic must be false.
// It's good enough for the current usage, and it can be improved if needed.
// TODO: use "group by" of the indexer engines to implement it.
func CountIssuesByRepo(ctx context.Context, opts *SearchOptions) (map[int64]int64, error) {
+	if opts.Keyword == "" {
+		opt, _ := db.ToDBOptions(ctx, opts)
+		return issue_model.CountIssuesByRepo(ctx, opt)
+	}

But it doesn't really help because if you search for issues with a non-empty keyword, it will still be very slow when you have 19,000 repos.

And I believe #28304 can fix it by giving up the counts grouped by repo.

image

You may be curious as to why there are no counts grouped by repo in your screenshot. Actually, there are counts grouped by 19000 repos (that's why it's slow), but most of them have been hidden because the counts are zero.

And why was the old version fast? Because the search results could be incomplete, which is worse.

@somera
Copy link
Author

somera commented Dec 1, 2023

I'm using my gitea instance only with one user. I have 19200 repos and ~3100 orgas.

Issues
image

Pull-Requests
image

My question is, why the design is so?

When I take a look on my case. No issues, but the indexer is checking all repos? And all my pull-requests are closed. Means, the page should be just "loaded". If I open the 310 closed pull-requests the details will be loaded.

@somera
Copy link
Author

somera commented Dec 1, 2023

And with 1.20.x I saw the pull request when I opened the pull-requests tab. There was no problems and it needed ~500ms. Here is a screenshot from 1.20.5.

Screenshot (4672)

You see the difference.

@wolfogre
Copy link
Member

wolfogre commented Dec 1, 2023

@somera I fully understand your question, but I cannot explain it in a few sentences. If you want to understand the whole story, you need to read

Or, you may simply hope that the problem can be resolved. Please wait patiently.

@somera
Copy link
Author

somera commented Dec 1, 2023

@wolfogre thx. I will take a look.

lunny pushed a commit that referenced this issue Dec 7, 2023
It will fix #28268 .

<img width="1313" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/cb1e07d5-7a12-4691-a054-8278ba255bfc">

<img width="1318" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/4fd60820-97f1-4c2c-a233-d3671a5039e9">

## ⚠️ BREAKING ⚠️

But need to give up some features:

<img width="1312" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/281c0d51-0e7d-473f-bbed-216e2f645610">

However, such abandonment may fix #28055 .

## Backgroud

When the user switches the dashboard context to an org, it means they
want to search issues in the repos that belong to the org. However, when
they switch to themselves, it means all repos they can access because
they may have created an issue in a public repo that they don't own.

<img width="286" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/182dcd5b-1c20-4725-93af-96e8dfae5b97">

It's a confusing design. Think about this: What does "In your
repositories" mean when the user switches to an org? Repos belong to the
user or the org?

Whatever, it has been broken by #26012 and its following PRs. After the
PR, it searches for issues in repos that the dashboard context user owns
or has been explicitly granted access to, so it causes #28268.

## How to fix it

It's not really difficult to fix it. Just extend the repo scope to
search issues when the dashboard context user is the doer. Since the
user may create issues or be mentioned in any public repo, we can just
set `AllPublic` to true, which is already supported by indexers. The DB
condition will also support it in this PR.

But the real difficulty is how to count the search results grouped by
repos. It's something like "search issues with this keyword and those
filters, and return the total number and the top results. **Then, group
all of them by repo and return the counts of each group.**"

<img width="314" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/5206eb20-f8f5-49b9-b45a-1be2fcf679f4">

Before #26012, it was being done in the DB, but it caused the results to
be incomplete (see the description of #26012).

And to keep this, #26012 implement it in an inefficient way, just count
the issues by repo one by one, so it cannot work when `AllPublic` is
true because it's almost impossible to do this for all public repos.


https://github.com/go-gitea/gitea/blob/1bfcdeef4cca0f5509476358e5931c13d37ed1ca/modules/indexer/issues/indexer.go#L318-L338

## Give up unnecessary features

We may can resovle `TODO: use "group by" of the indexer engines to
implement it`, I'm sure it can be done with Elasticsearch, but IIRC,
Bleve and Meilisearch don't support "group by".

And the real question is, does it worth it? Why should we need to know
the counts grouped by repos?

Let me show you my search dashboard on gitea.com.

<img width="1304" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/2bca2d46-6c71-4de1-94cb-0c9af27c62ff">

I never think the long repo list helps anything.

And if we agree to abandon it, things will be much easier. That is this
PR.

## TODO

I know it's important to filter by repos when searching issues. However,
it shouldn't be the way we have it now. It could be implemented like
this.

<img width="1316" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/99ee5f21-cbb5-4dfe-914d-cb796cb79fbe">

The indexers support it well now, but it requires some frontend work,
which I'm not good at. So, I think someone could help do that in another
PR and merge this one to fix the bug first.

Or please block this PR and help to complete it.

Finally, "Switch dashboard context" is also a design that needs
improvement. In my opinion, it can be accomplished by adding filtering
conditions instead of "switching".
brechtvl pushed a commit to blender/gitea that referenced this issue Dec 7, 2023
…8304)

It will fix go-gitea#28268 .

<img width="1313" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/cb1e07d5-7a12-4691-a054-8278ba255bfc">

<img width="1318" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/4fd60820-97f1-4c2c-a233-d3671a5039e9">

But need to give up some features:

<img width="1312" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/281c0d51-0e7d-473f-bbed-216e2f645610">

However, such abandonment may fix go-gitea#28055 .

When the user switches the dashboard context to an org, it means they
want to search issues in the repos that belong to the org. However, when
they switch to themselves, it means all repos they can access because
they may have created an issue in a public repo that they don't own.

<img width="286" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/182dcd5b-1c20-4725-93af-96e8dfae5b97">

It's a confusing design. Think about this: What does "In your
repositories" mean when the user switches to an org? Repos belong to the
user or the org?

Whatever, it has been broken by go-gitea#26012 and its following PRs. After the
PR, it searches for issues in repos that the dashboard context user owns
or has been explicitly granted access to, so it causes go-gitea#28268.

It's not really difficult to fix it. Just extend the repo scope to
search issues when the dashboard context user is the doer. Since the
user may create issues or be mentioned in any public repo, we can just
set `AllPublic` to true, which is already supported by indexers. The DB
condition will also support it in this PR.

But the real difficulty is how to count the search results grouped by
repos. It's something like "search issues with this keyword and those
filters, and return the total number and the top results. **Then, group
all of them by repo and return the counts of each group.**"

<img width="314" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/5206eb20-f8f5-49b9-b45a-1be2fcf679f4">

Before go-gitea#26012, it was being done in the DB, but it caused the results to
be incomplete (see the description of go-gitea#26012).

And to keep this, go-gitea#26012 implement it in an inefficient way, just count
the issues by repo one by one, so it cannot work when `AllPublic` is
true because it's almost impossible to do this for all public repos.

https://github.com/go-gitea/gitea/blob/1bfcdeef4cca0f5509476358e5931c13d37ed1ca/modules/indexer/issues/indexer.go#L318-L338

We may can resovle `TODO: use "group by" of the indexer engines to
implement it`, I'm sure it can be done with Elasticsearch, but IIRC,
Bleve and Meilisearch don't support "group by".

And the real question is, does it worth it? Why should we need to know
the counts grouped by repos?

Let me show you my search dashboard on gitea.com.

<img width="1304" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/2bca2d46-6c71-4de1-94cb-0c9af27c62ff">

I never think the long repo list helps anything.

And if we agree to abandon it, things will be much easier. That is this
PR.

I know it's important to filter by repos when searching issues. However,
it shouldn't be the way we have it now. It could be implemented like
this.

<img width="1316" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/99ee5f21-cbb5-4dfe-914d-cb796cb79fbe">

The indexers support it well now, but it requires some frontend work,
which I'm not good at. So, I think someone could help do that in another
PR and merge this one to fix the bug first.

Or please block this PR and help to complete it.

Finally, "Switch dashboard context" is also a design that needs
improvement. In my opinion, it can be accomplished by adding filtering
conditions instead of "switching".
@somera
Copy link
Author

somera commented Dec 12, 2023

I can't reopen it.

This is how it looks like in 1.21.2 -> with the fix.

image

Nothing changes.

I don't understand why so much queries to show TWO open pull requests and just show the number with 338 closed pull requests.

The logged user hat ~3000 orgas and ~19000 repos.

@wolfogre ?

@wolfogre
Copy link
Member

@somera It hasn't been backported to v1.21.2 because it's breaking, see #28385 (comment)

Could you please try the latest nightly version? Or you can wait for v1.22.

@somera
Copy link
Author

somera commented Dec 12, 2023

Thx for the info. I can wait for 1.22.x

fuxiaohei pushed a commit to fuxiaohei/gitea that referenced this issue Jan 17, 2024
…8304)

It will fix go-gitea#28268 .

<img width="1313" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/cb1e07d5-7a12-4691-a054-8278ba255bfc">

<img width="1318" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/4fd60820-97f1-4c2c-a233-d3671a5039e9">

## ⚠️ BREAKING ⚠️

But need to give up some features:

<img width="1312" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/281c0d51-0e7d-473f-bbed-216e2f645610">

However, such abandonment may fix go-gitea#28055 .

## Backgroud

When the user switches the dashboard context to an org, it means they
want to search issues in the repos that belong to the org. However, when
they switch to themselves, it means all repos they can access because
they may have created an issue in a public repo that they don't own.

<img width="286" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/182dcd5b-1c20-4725-93af-96e8dfae5b97">

It's a confusing design. Think about this: What does "In your
repositories" mean when the user switches to an org? Repos belong to the
user or the org?

Whatever, it has been broken by go-gitea#26012 and its following PRs. After the
PR, it searches for issues in repos that the dashboard context user owns
or has been explicitly granted access to, so it causes go-gitea#28268.

## How to fix it

It's not really difficult to fix it. Just extend the repo scope to
search issues when the dashboard context user is the doer. Since the
user may create issues or be mentioned in any public repo, we can just
set `AllPublic` to true, which is already supported by indexers. The DB
condition will also support it in this PR.

But the real difficulty is how to count the search results grouped by
repos. It's something like "search issues with this keyword and those
filters, and return the total number and the top results. **Then, group
all of them by repo and return the counts of each group.**"

<img width="314" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/5206eb20-f8f5-49b9-b45a-1be2fcf679f4">

Before go-gitea#26012, it was being done in the DB, but it caused the results to
be incomplete (see the description of go-gitea#26012).

And to keep this, go-gitea#26012 implement it in an inefficient way, just count
the issues by repo one by one, so it cannot work when `AllPublic` is
true because it's almost impossible to do this for all public repos.


https://github.com/go-gitea/gitea/blob/1bfcdeef4cca0f5509476358e5931c13d37ed1ca/modules/indexer/issues/indexer.go#L318-L338

## Give up unnecessary features

We may can resovle `TODO: use "group by" of the indexer engines to
implement it`, I'm sure it can be done with Elasticsearch, but IIRC,
Bleve and Meilisearch don't support "group by".

And the real question is, does it worth it? Why should we need to know
the counts grouped by repos?

Let me show you my search dashboard on gitea.com.

<img width="1304" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/2bca2d46-6c71-4de1-94cb-0c9af27c62ff">

I never think the long repo list helps anything.

And if we agree to abandon it, things will be much easier. That is this
PR.

## TODO

I know it's important to filter by repos when searching issues. However,
it shouldn't be the way we have it now. It could be implemented like
this.

<img width="1316" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/99ee5f21-cbb5-4dfe-914d-cb796cb79fbe">

The indexers support it well now, but it requires some frontend work,
which I'm not good at. So, I think someone could help do that in another
PR and merge this one to fix the bug first.

Or please block this PR and help to complete it.

Finally, "Switch dashboard context" is also a design that needs
improvement. In my opinion, it can be accomplished by adding filtering
conditions instead of "switching".
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 27, 2024
silverwind pushed a commit to silverwind/gitea that referenced this issue Feb 20, 2024
…8304)

It will fix go-gitea#28268 .

<img width="1313" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/cb1e07d5-7a12-4691-a054-8278ba255bfc">

<img width="1318" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/4fd60820-97f1-4c2c-a233-d3671a5039e9">

## ⚠️ BREAKING ⚠️

But need to give up some features:

<img width="1312" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/281c0d51-0e7d-473f-bbed-216e2f645610">

However, such abandonment may fix go-gitea#28055 .

## Backgroud

When the user switches the dashboard context to an org, it means they
want to search issues in the repos that belong to the org. However, when
they switch to themselves, it means all repos they can access because
they may have created an issue in a public repo that they don't own.

<img width="286" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/182dcd5b-1c20-4725-93af-96e8dfae5b97">

It's a confusing design. Think about this: What does "In your
repositories" mean when the user switches to an org? Repos belong to the
user or the org?

Whatever, it has been broken by go-gitea#26012 and its following PRs. After the
PR, it searches for issues in repos that the dashboard context user owns
or has been explicitly granted access to, so it causes go-gitea#28268.

## How to fix it

It's not really difficult to fix it. Just extend the repo scope to
search issues when the dashboard context user is the doer. Since the
user may create issues or be mentioned in any public repo, we can just
set `AllPublic` to true, which is already supported by indexers. The DB
condition will also support it in this PR.

But the real difficulty is how to count the search results grouped by
repos. It's something like "search issues with this keyword and those
filters, and return the total number and the top results. **Then, group
all of them by repo and return the counts of each group.**"

<img width="314" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/5206eb20-f8f5-49b9-b45a-1be2fcf679f4">

Before go-gitea#26012, it was being done in the DB, but it caused the results to
be incomplete (see the description of go-gitea#26012).

And to keep this, go-gitea#26012 implement it in an inefficient way, just count
the issues by repo one by one, so it cannot work when `AllPublic` is
true because it's almost impossible to do this for all public repos.


https://github.com/go-gitea/gitea/blob/1bfcdeef4cca0f5509476358e5931c13d37ed1ca/modules/indexer/issues/indexer.go#L318-L338

## Give up unnecessary features

We may can resovle `TODO: use "group by" of the indexer engines to
implement it`, I'm sure it can be done with Elasticsearch, but IIRC,
Bleve and Meilisearch don't support "group by".

And the real question is, does it worth it? Why should we need to know
the counts grouped by repos?

Let me show you my search dashboard on gitea.com.

<img width="1304" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/2bca2d46-6c71-4de1-94cb-0c9af27c62ff">

I never think the long repo list helps anything.

And if we agree to abandon it, things will be much easier. That is this
PR.

## TODO

I know it's important to filter by repos when searching issues. However,
it shouldn't be the way we have it now. It could be implemented like
this.

<img width="1316" alt="image"
src="https://github.com/go-gitea/gitea/assets/9418365/99ee5f21-cbb5-4dfe-914d-cb796cb79fbe">

The indexers support it well now, but it requires some frontend work,
which I'm not good at. So, I think someone could help do that in another
PR and merge this one to fix the bug first.

Or please block this PR and help to complete it.

Finally, "Switch dashboard context" is also a design that needs
improvement. In my opinion, it can be accomplished by adding filtering
conditions instead of "switching".
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance/speed performance issues with slow downs type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants