Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task running since 2 weeks #16113

Closed
1 of 4 tasks
somera opened this issue Jun 8, 2021 · 25 comments · Fixed by #17991 or #19454
Closed
1 of 4 tasks

Task running since 2 weeks #16113

somera opened this issue Jun 8, 2021 · 25 comments · Fixed by #17991 or #19454
Labels

Comments

@somera
Copy link

somera commented Jun 8, 2021

  • Gitea version (or commit ref): 1.14.2
  • Git version:
  • Operating system: Ubuntu 20.04
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Log gist:

Description

Just found today the task, which is running since 2 weeks. There is no system git process.

And it is not possible to remove it.

Screenshots

image

image

@lunny lunny added the type/bug label Jun 9, 2021
@zeripath
Copy link
Contributor

zeripath commented Jun 9, 2021

Interesting - if there's no system git process related then there's no task - so somehow the process manager hasn't been informed that the process has died.
Unfortunately with it being two weeks ago you're unlikely to have any logs to help us diagnose what happened so all we can do is double check the process manager code - however it's likely I've already fixed this by returning a synthetic cancel that does the cleanup of the process manager in one step already.

In terms of your problem - if there's genuinely no system git process - just restart Gitea.

@somera
Copy link
Author

somera commented Jun 9, 2021

I restarted Gitea yesterday. ;)

I have logs. I check them for some errors.

@somera
Copy link
Author

somera commented Jul 29, 2021

Gitea 1.14.5

image

@somera
Copy link
Author

somera commented Jul 29, 2021

There isn't any git process

$ sudo ps aux | grep git
git          723  5.1  6.0 3089384 464984 ?      Ssl  Jul24 385:03 /home/git/bin/gitea web -c /etc/gitea/app.ini
postgres  816678  0.4  0.4 649512 34936 ?        Ss   16:13   0:02 postgres: 13/main: postgres giteadb 192.168.178.20(64337) idle
postgres  816740  0.0  0.3 637684 28100 ?        Ss   16:13   0:00 postgres: 13/main: postgres giteadb 192.168.178.20(62246) idle
postgres  827016  0.1  0.6 637780 47948 ?        Ss   16:20   0:00 postgres: 13/main: gitea giteadb 127.0.0.1(40602) idle
postgres  827152  0.1  0.5 637912 43756 ?        Ss   16:20   0:00 postgres: 13/main: gitea giteadb 127.0.0.1(40646) idle

@somera
Copy link
Author

somera commented Jul 29, 2021

I can only restart Gitea. But the stop process need more than 2 minutes. Cause something hangs.

2021/07/29 16:26:07 ...eful/manager_unix.go:127:handleSignals() [W] PID 723. Received SIGTERM. Shutting down...
2021/07/29 16:26:07 ...ue/queue_bytefifo.go:194:Shutdown() [D] level: issue_indexer Shutdown
2021/07/29 16:26:07 ...ue/queue_bytefifo.go:194:Shutdown() [D] level: notification-service-level Shutdown
2021/07/29 16:26:07 ...ue/queue_bytefifo.go:194:Shutdown() [D] unique-level: repo_stats_update-level Shutdown
2021/07/29 16:26:07 ...ue/queue_bytefifo.go:194:Shutdown() [D] level: mail-level Shutdown
2021/07/29 16:26:07 ...ue/queue_bytefifo.go:194:Shutdown() [D] level: task-level Shutdown
2021/07/29 16:26:07 ...ueue_disk_channel.go:243:Shutdown() [D] PersistableChannelQueue: notification-service Shutdown
2021/07/29 16:26:07 ...ue/queue_bytefifo.go:194:Shutdown() [D] level: push_update-level Shutdown
2021/07/29 16:26:07 ...ue/queue_bytefifo.go:194:Shutdown() [D] unique-level: pr_patch_checker-level Shutdown
2021/07/29 16:26:07 ...ueue_disk_channel.go:229:Shutdown() [D] PersistableChannelUniqueQueue: repo_stats_update Shutdown
2021/07/29 16:26:07 ...s/graceful/server.go:169:Serve() [D] Waiting for connections to finish... (PID: 723)
2021/07/29 16:26:07 ...ueue_disk_channel.go:243:Shutdown() [D] PersistableChannelQueue: mail Shutdown
2021/07/29 16:26:07 ...ueue_disk_channel.go:243:Shutdown() [D] PersistableChannelQueue: task Shutdown
2021/07/29 16:26:07 ...ueue_disk_channel.go:243:Shutdown() [D] PersistableChannelQueue: push_update Shutdown
2021/07/29 16:26:07 ...ueue_disk_channel.go:229:Shutdown() [D] PersistableChannelUniqueQueue: pr_patch_checker Shutdown
2021/07/29 16:26:07 ...eful/server_hooks.go:47:doShutdown() [I] PID: 723 Listener ([::]:3000) closed.
2021/07/29 16:26:07 ...s/graceful/server.go:175:Serve() [D] Serve() returning... (PID: 723)
2021/07/29 16:26:07 cmd/web.go:237:listen() [I] HTTP Listener: 0.0.0.0:3000 Closed
2021/07/29 16:27:07 .../graceful/manager.go:217:doHammerTime() [W] Setting Hammer condition

@somera
Copy link
Author

somera commented Jul 29, 2021

Looks like there is a problem with https://github.com/akka/doc.akka.io .

@lunny
Copy link
Member

lunny commented Jul 29, 2021

There isn't any git process

$ sudo ps aux | grep git
git          723  5.1  6.0 3089384 464984 ?      Ssl  Jul24 385:03 /home/git/bin/gitea web -c /etc/gitea/app.ini
postgres  816678  0.4  0.4 649512 34936 ?        Ss   16:13   0:02 postgres: 13/main: postgres giteadb 192.168.178.20(64337) idle
postgres  816740  0.0  0.3 637684 28100 ?        Ss   16:13   0:00 postgres: 13/main: postgres giteadb 192.168.178.20(62246) idle
postgres  827016  0.1  0.6 637780 47948 ?        Ss   16:20   0:00 postgres: 13/main: gitea giteadb 127.0.0.1(40602) idle
postgres  827152  0.1  0.5 637912 43756 ?        Ss   16:20   0:00 postgres: 13/main: gitea giteadb 127.0.0.1(40646) idle

That means the record hasn't been removed from processes list when process is end.

@somera
Copy link
Author

somera commented Jul 29, 2021

There isn't any git process

$ sudo ps aux | grep git
git          723  5.1  6.0 3089384 464984 ?      Ssl  Jul24 385:03 /home/git/bin/gitea web -c /etc/gitea/app.ini
postgres  816678  0.4  0.4 649512 34936 ?        Ss   16:13   0:02 postgres: 13/main: postgres giteadb 192.168.178.20(64337) idle
postgres  816740  0.0  0.3 637684 28100 ?        Ss   16:13   0:00 postgres: 13/main: postgres giteadb 192.168.178.20(62246) idle
postgres  827016  0.1  0.6 637780 47948 ?        Ss   16:20   0:00 postgres: 13/main: gitea giteadb 127.0.0.1(40602) idle
postgres  827152  0.1  0.5 637912 43756 ?        Ss   16:20   0:00 postgres: 13/main: gitea giteadb 127.0.0.1(40646) idle

That means the record hasn't been removed from processes list when process is end.

Looks like.

But gitea stop took too long. Something is blocked. I'm restarting my Gitea 1 or 2x in a month. Only if the reboot is required.

@somera
Copy link
Author

somera commented Aug 6, 2021

@lunny I removed now this mirror to solve my "problem".

But there is a bug in Gitea. You can close it or leave it open.

@zeripath
Copy link
Contributor

zeripath commented Aug 8, 2021

@somera but I don't understand how this is happening.

I just don't. Let's look at the code:

ctx, cancel := context.WithTimeout(c.parentContext, timeout)
defer cancel()
cmd := exec.CommandContext(ctx, c.name, c.args...)
if env == nil {
cmd.Env = os.Environ()
} else {
cmd.Env = env
}
cmd.Env = append(
cmd.Env,
fmt.Sprintf("LC_ALL=%s", DefaultLocale),
// avoid prompting for credentials interactively, supported since git v2.3
"GIT_TERMINAL_PROMPT=0",
)
// TODO: verify if this is still needed in golang 1.15
if goVersionLessThan115 {
cmd.Env = append(cmd.Env, "GODEBUG=asyncpreemptoff=1")
}
cmd.Dir = dir
cmd.Stdout = stdout
cmd.Stderr = stderr
cmd.Stdin = stdin
if err := cmd.Start(); err != nil {
return err
}
desc := c.desc
if desc == "" {
desc = fmt.Sprintf("%s %s %s [repo_path: %s]", GitExecutable, c.name, strings.Join(c.args, " "), dir)
}
pid := process.GetManager().Add(desc, cancel)
defer process.GetManager().Remove(pid)
if fn != nil {
err := fn(ctx, cancel)
if err != nil {
cancel()
_ = cmd.Wait()
return err
}
}
if err := cmd.Wait(); err != nil && ctx.Err() != context.DeadlineExceeded {
return err
}
return ctx.Err()

We create a context with its own cancel which will be cancelled at the end of the function.

Then we set up the command

We register the command and its cancel with the process manager and will be deregistered at the end of the function:

pid := process.GetManager().Add(desc, cancel)
defer process.GetManager().Remove(pid)

We then run the command and wait for it to end.


When the command terminates - Go will tell us here:

if err := cmd.Wait(); err != nil && ctx.Err() != context.DeadlineExceeded {

and so the deferred function at 156 will lead to the process being removed from the manager.

If you click the cancel button on the process manager, then the context at line 122 will be cancelled - which will lead to go killing the process and then the Wait at 167 will end and 171 will be returned.


So how does anything get stuck in the process manager?

It doesn't make sense.

@somera
Copy link
Author

somera commented Aug 8, 2021

My Gitea instance is updating the mirrors every 24h. This projest was a mirror. After some days the one mirror process in Gitea was "running". Stopping the process wasn't working. And there was no git linux process. And stopping gitea process in this case took longer as normaly. Like something was blocking.

@zeripath
Copy link
Contributor

zeripath commented Aug 8, 2021

Yes I understand what you're saying but I still don't understand how this can be happening.

The only place such a problem could occur is in cmd.Wait() which implies a deep Go problem or OS problem.

If there's really no process in the OS table then why isn't cmd.Wait finishing?

@somera
Copy link
Author

somera commented Aug 8, 2021

Yes I understand what you're saying but I still don't understand how this can be happening.

The only place such a problem could occur is in cmd.Wait() which implies a deep Go problem or OS problem.

Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-80-generic x86_64) problem? I say no. Cause the probability that another repo would be affected would be higher.

But what I saw was ... only the one specific repo has problem.

You can add the mirror (https://github.com/akka/doc.akka.io.git) on your instance and take a look on this. I do this again.

If there's really no process in the OS table then why isn't cmd.Wait finishing?

Iwas not checking this every day. ;) Means, that I found it days later. Here #16113 (comment) you can see two mirror processes.

On my Gitea instance I have 5804 mirrors. ~99% from GitHub. The sync process needs ~3h. If you makes too much requests to GitHub than you have to wait. I don't know what happens, when Gitea is updating so much repos in a row. Did you know this? But if there was a problem with GitHub other repos will be affected.

@somera
Copy link
Author

somera commented Aug 8, 2021

After out conversation I started new mirror for the project. And the result is:

image

There is no process. I didn't find nothign special in my logs.

@somera
Copy link
Author

somera commented Aug 8, 2021

Cancel the process should be fixed in 1.16.0.

@6543
Copy link
Member

6543 commented Aug 31, 2021

@somera I think this is also about the "pipe leaks" we just recently fixed: #16886, #16894, #16899

can you confirm they are now gone? (current master branch or latest release/1.15 branch)

@somera
Copy link
Author

somera commented Aug 31, 2021

@somera I think this is also about the "pipe leaks" we just recently fixed: #16886, #16894, #16899

can you confirm they are now gone? (current master branch or latest release/1.15 branch)

sorry, I can't confirm this, cause I'm using only the release version on my system.

I try to setup new Gitea system for testing.

@somera
Copy link
Author

somera commented Oct 8, 2021

I could'n test it. But today I checked it with Gitea 1.5.3 and it's worse than before. Mirror process is finished, but what I see is

image

Try to mirror https://github.com/akka/doc.akka.io on your local instances.

@somera
Copy link
Author

somera commented Oct 8, 2021

And than if you click on the trash

image

the one git process will be killed and the CPU usage goes down. But the mirror entry on the admin/mirror page still exists. Should be reproducable for you.

After I deleted the repo my Gitea needed a restart to remove the two mirror process from the admin/mirror page.

@zeripath
Copy link
Contributor

I think I've finally discovered the underlying issue here:

If git cat-file is run on a broken git repository instead of immediately fatalling it will hang until stdin is closed. This will result in dangling cat-file processes.

Therefore #17991 (and the backport will fix this.)

@lunny
Copy link
Member

lunny commented Dec 16, 2021

I think I've finally discovered the underlying issue here:

If git cat-file is run on a broken git repository instead of immediately fatalling it will hang until stdin is closed. This will result in dangling cat-file processes.

Therefore #17991 (and the backport will fix this.)

I don't think so. Most of the repositories are normal but not broken git repositories.

zeripath added a commit that referenced this issue Dec 16, 2021
…and other fixes (#17991)

This PR contains multiple fixes. The most important of which is:

* Prevent hang in git cat-file if the repository is not a valid repository 
    
    Unfortunately it appears that if git cat-file is run in an invalid
    repository it will hang until stdin is closed. This will result in
    deadlocked /pulls pages and dangling git cat-file calls if a broken
    repository is tried to be reviewed or pulls exists for a broken
    repository.

    Fix #14734
    Fix #9271
    Fix #16113

Otherwise there are a few small other fixes included which this PR was initially intending to fix:

* Fix panic on partial compares due to missing PullRequestWorkInProgressPrefixes
* Fix links on pulls pages  due to regression from #17551 - by making most /issues routes match /pulls too - Fix #17983
* Fix links on feeds pages due to another regression from #17551 but also fix issue with syncing tags - Fix #17943
* Add missing locale entries for oauth group claims
* Prevent NPEs if ColorFormat is called on nil users, repos or teams.
Chianina pushed a commit to Chianina/gitea that referenced this issue Mar 28, 2022
…and other fixes (go-gitea#17991)

This PR contains multiple fixes. The most important of which is:

* Prevent hang in git cat-file if the repository is not a valid repository 
    
    Unfortunately it appears that if git cat-file is run in an invalid
    repository it will hang until stdin is closed. This will result in
    deadlocked /pulls pages and dangling git cat-file calls if a broken
    repository is tried to be reviewed or pulls exists for a broken
    repository.

    Fix go-gitea#14734
    Fix go-gitea#9271
    Fix go-gitea#16113

Otherwise there are a few small other fixes included which this PR was initially intending to fix:

* Fix panic on partial compares due to missing PullRequestWorkInProgressPrefixes
* Fix links on pulls pages  due to regression from go-gitea#17551 - by making most /issues routes match /pulls too - Fix go-gitea#17983
* Fix links on feeds pages due to another regression from go-gitea#17551 but also fix issue with syncing tags - Fix go-gitea#17943
* Add missing locale entries for oauth group claims
* Prevent NPEs if ColorFormat is called on nil users, repos or teams.
@somera
Copy link
Author

somera commented Apr 21, 2022

Testes again with Gitea 1.16.6. Started mirror https://github.com/akka/doc.akka.io -> finished, but:

image

without any process:

$ ps aux | grep akka
git       638257  0.0  0.0   6300   656 pts/2    S+   19:38   0:00 grep --color=auto akka

@zeripath
Copy link
Contributor

Will be fixed by #19454

@zeripath
Copy link
Contributor

The problem was finally found after #19207 was merged in to main however if we had been provided with a pprof goroutine dump from an affected instance with ENABLE_PPROF=true in the app.ini :

wget http://localhost:6060/debug/pprof/goroutine

The issue might have been able to be found and investigated quicker.

I think this means that we need to add simple buttons for downloading pprofs etc and commands to get them from gitea manager and I think we should probably just go ahead and provide these.

@somera
Copy link
Author

somera commented Apr 22, 2022

I attached the file:

goroutine.zip

I downloaded it after the new mirror of the affected project.

@go-gitea go-gitea locked and limited conversation to collaborators Apr 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
4 participants