-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: routers/repo/http.go:430:serviceRPC() [E] Fail to serve RPC(upload-pack): exit status 128 #9006
Comments
We need a more friendly log on that error. |
I'm going to make an educated guess:
I'll try to decode the log properly at some point - it's just ASCII bytes - if you could do that and put it up that would be great - and see if I can put up a pr to make log more reasonably. (It would be a single line pr.) |
Ha, OK that's a fun one. It works out to
which is base64 decoded
which is indeed what I saw from the strace. So that confirms that :) But it would be great if it correlated which repo/request it was coming from, as I still have no idea. I don't think any of your suggestions have changed at all. This is part of a 8-node cluster fronted by a haproxy, and it doesn't seem the other nodes are exhibiting this issue. I'm really not sure when it started, which i know isn't helpful :( |
This is a trace of the git upload-pack process that seems to die and cause this issue 16131.trace.txt |
I'm seeing the issue that @ianw is describing. We have a local repo that passes I'm using |
May be some git hook that takes too long? Git would abort on the client side and the server would show that error when the hook finishes. |
@guillep2k I don't think so; I'm not aware of any hooks really, and you can see in the trace I put up before it happens very quickly? edit: looking again there's no timestamps, so this is not at all obvious :) but it's not stalling, afaict |
Well, the remote end (i.e. the client) seems to be the one closing the connection, so if it's not because of a timeout, something in the response must be upsetting it. Perhaps the client has the problem? I imagine the following test: clone the repo in another machine (B) and add (B) as a remote for (A); then push your changes from (A) to (B) and see if you succeed; then you can also try pushing from (B) to Gitea after that. Otherwise it's worth looking for more info on the client side to see what's making it abort the connection. |
The client still has an open socket to the gitea server:
The last thing git (with GIT_CURL_VERBOSE) prints is:
So AFAICT the client isn't dropping the connection, an strace on the client doesn't show the any calls to close() on the fd |
OK, but you can see that:
Maybe your repo is not in a local file system? |
@guillep2k all the repos are local; gitea is running in a container and has access to the repos that way. The only thing pushing to gitea is gerrit as it replicates. Otherwise it's read traffic. The interesting thing is that there's a TCP load-balancer between each gitea node. However, haproxy does not appear to be showing any sort of errors |
I don't understand: why are there two instances of gitea? A load balancer suggests a single file-system shared between the instances and that should give you all kinds of problems (not necessarily this one, however). Anyway, it doesn't seem to be Gitea but As a last messure, I'd consider upgrading to 1.9.6. There has been some rework in the way Gitea handles git resources; especifically to avoid keeping open handles (#8901, #8958). This may be related to your issue. |
@guillep2k there's no shared file-system in this case; each separate gitea node is on it's own server. gerrit's repos are the one-true-source, and it replicates changes out to the gitea nodes -- it is the only writer. I agree that it looks like git is dying here, but it's very odd that this just started happening. We are seeing this across multiple servers, however. It's not just one. I agree we're not being super helpful here as I don't know exactly when this started. I've tried to dump the connection, but it's all just encrypted binary stuff which will take a whole heap of effort to decode in a tracer. But I didn't see anything obvious. I think we'll have to try upgrading at this point, and if we see the issue continue with 1.9.6 dig deeper. |
* Update system-config from branch 'master' - Merge "gitea: Use 1.9.6" - gitea: Use 1.9.6 We are seeing issues with hanging git connections discussed in [1]. It is suggested to upgrade to gitea 1.9.6; do that. [1] go-gitea/gitea#9006 Change-Id: Ibbbe73b5487d3d01a8d7ba23ecca16c2264973ca
@ianw and team upgraded to 1.9.6 and the problem persists. With the repo below we've recreated the "hang" on a number of Linux distros and git versions Grab the repo @ https://ozlabs.org/~tony/nova_bad.tar.gz and unpack it and then run something like:
It's entirely possible that there is a problem with that repo that Any ideas of what else to try? If not I'll using a local gitea server at some point next week |
I suspect that I have the same problem: An existing gitea repo working fine until some commit. After that pushing hangs and the git remote http process hogs a complete cpu core. After 8 minutes a timeout occurs. I deleted the repo and tried to do a clean push, same problem. |
Unfortunately not - I know it is frustrating. |
Can you share the project for @lunny to be able to triage this? |
I assume it would be helpful if you could share your repo and identify the commit from which point the gitea Push fails |
Hi peeps! So I finally got round to looking at this. First of all I can't replicate this on master on sqlite, but I do suspect however, that this might be another example of the tcp port exhaustion issue as I see that you're using MySQL. You will need to set MAX_OPEN_CONNS, MAX_IDLE_CONNS and CONN_MAX_LIFETIME. In particular you should set MAX_OPEN_CONNS = MAX_IDLE_CONNS. |
For me it seems that the issue was fixed by forcing the workdir permissions on all files to be reset. It was not the sql lite but the git repository itself it seems. |
@everflux if the permissions on the git repository were messed up I could imagine that git would fail in this way. |
[x]
):Description
One of our gitea hosts has started showing a constant error
The numbers are always the same; and it doesn't tell me what repos it is associated with. A slightly longer extract:
I have deleted all the repos and recreated them (using "Reinitialize all missing Git repositories for which records exist") but it is still happened.
I then started to strace the gitea process to see what was going on. I managed to catch what I think is the cause of this error:
This command matches what would be running at https://github.com/go-gitea/gitea/blob/master/routers/repo/http.go#L418 and has the 128 exit code, so I think it matches up. It would be really helpful if this was expressed in the error message.
I'm investigating the host to see if there's any networking type issues that might be affecting clients, although it doesn't look like it to me.
One user who seemed to hit this said that their clone was just stuck for hours, they didn't get an specific error message. They were using git 2.21.0
The text was updated successfully, but these errors were encountered: