Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherrypicker missing many (all?) cherrypicks from label #2309

Open
howardjohn opened this issue Jan 28, 2020 · 15 comments
Open

Cherrypicker missing many (all?) cherrypicks from label #2309

howardjohn opened this issue Jan 28, 2020 · 15 comments
Assignees
Labels

Comments

@howardjohn
Copy link
Member

Example: istio/istio#20543, istio/istio#20442

Do we need to setup config per-branch for this label? I could not find anything where we specify this so I assumed it applies to any cherrypick/* label. If the labels don't work we should just remove the labels and have everyone use the command since it seems to work more reliably

@clarketm
Copy link
Member

clarketm commented Jan 28, 2020

Do we need to setup config per-branch for this label? I could not find anything where we specify this so I assumed it applies to any cherrypick/* label. If the labels don't work we should just remove the labels and have everyone use the command since it seems to work more reliably

@howardjohn

There is no special configuration; the plugin simply looks for the cherrypick/ label prefix. Interestingly, not all 1.5 cherrypicks were failures: istio/istio#20244; istio/istio#20408; istio/istio#20530 though a large portion were 🤔. I would surmise that it is not an issue specifically with the label but more generally with the plugin (since this is not the first occurrence of it flaking: #1598). Moreover, the slash comment (i.e. /cherrypick ...) version seems to be prone to error as well: istio/istio#20511 (comment), which further leads me to this conclusion. There is, of course, a possiblity that this is environmental (i.e. GitHub not sending webhooks), but upon initial inspection I do not think this is the case. Regardless, I recommend going with the slash comment version if it is more reliable - at least until I can delve into the logs, determine the cause, and provide a solution.

@ericvn
Copy link
Contributor

ericvn commented Jan 28, 2020

FYI: istio/istio.io#6332

@clarketm
Copy link
Member

clarketm commented Jan 28, 2020

All of the errors that I am seeing thus far are git fetch timeouts (related: https://github.com/openshift/release/issues/6110, kubernetes/test-infra#15225).

e.g. istio/istio#20543

{
 insertId: "170kjw2f50zixp"  
 jsonPayload: {
  error: "git fetch error: exit status 128. output: fatal: remote error: upload-pack: not our ref 2df38397f8b7ef9e1415dd777ddce3c64a0d149d
fatal: The remote end hung up unexpectedly
"   
  event-GUID: "ecdba800-4157-11ea-9f97-d30febca7c89"   
  event-type: "pull_request"   
  level: "info"   
  msg: "Cherry-pick failed."   
  plugin: "cherrypick"   
 }
 labels: {
  compute.googleapis.com/resource_name: "gke-prow-prod-pool-fb335f1f-5rds"   
  container.googleapis.com/namespace_name: "default"   
  container.googleapis.com/pod_name: "cherrypicker-6895fb69fc-mcchq"   
  container.googleapis.com/stream: "stderr"   
 }
 logName: "projects/istio-testing/logs/cherrypicker"  
 receiveTimestamp: "2020-01-27T22:54:27.758925058Z"  
 resource: {
  labels: {
   cluster_name: "prow"    
   container_name: "cherrypicker"    
   instance_id: "294245110686867759"    
   namespace_id: "default"    
   pod_id: "cherrypicker-6895fb69fc-mcchq"    
   project_id: "istio-testing"    
   zone: "us-west1-a"    
  }
  type: "container"   
 }
 severity: "ERROR"  
 timestamp: "2020-01-27T22:54:22Z"  
}

cc @cjwagner

@cjwagner
Copy link
Member

Well I don't see a 2df38397f8b7ef9e1415dd777ddce3c64a0d149d commit anywhere in the istio org so it makes sense that that fails. I wonder where that came from... Looks like we only fetch with git when checking out the repo and use GitHub to get the PR patch?

@howardjohn
Copy link
Member Author

That SHA is in my fork (and the PR referenced is from me): howardjohn/istio@2df3839. I am not exactly sure where its from, but its not completely random?

@cjwagner
Copy link
Member

Interesting, that commit is a merge of the merge commit of istio/istio#20543 and some other commit. I don't see anywhere we try to clone the user's fork and that commit shouldn't matter anyways since it is entirely separate from the PR...

@clarketm
Copy link
Member

clarketm commented Jan 29, 2020

There are plenty more, but here are a few of the other commits scraped from the logs which also failed on fetch: istio/istio@c36fd36, istio/istio@975734e, istio/istio@d9af9f8. They all appear to be merge commits.

@cjwagner
Copy link
Member

In particular they are all merge commits in which one of the 2 parent commits is the merge commit of a PR that requested a cherry pick.

@clarketm
Copy link
Member

clarketm commented Jan 31, 2020

Running a load test using the following code and git version 2.24.1 I was able to reproduce the error for istio/istio in an isolated environment. That tells me it is likely a git or GitHub issue:

export GIT_TRACE=1

tmp_dir=$(mktemp -d -t ci-XXXXXXXXXX)
trap 'rm -rf $mktemp' EXIT

org="istio"
format_string="https://github.com/%s/%s.git"

repos=(
    "istio.io"
    "istio"
    "release-builder"
    "proxy"
    "envoy"
    "cni"
    "operator"
    "client-go"
)

pushd "$tmp_dir"

for repo in "${repos[@]}"; do
    src="$(printf "$format_string" "$org" "$repo")"

    git clone --mirror "$src"

    pushd "$(basename "$src")"
    git fetch
    popd
done

popd

with error:

15:54:11.760238 git.c:439               trace: built-in: git clone --mirror https://github.com/istio/istio.git
Cloning into bare repository 'istio.git'...
warning: templates not found in /Users/clarketm/.git-templates
15:54:11.767597 run-command.c:663       trace: run_command: git-remote-https origin https://github.com/istio/istio.git
15:54:12.697024 run-command.c:663       trace: run_command: git fetch-pack --stateless-rpc --stdin --lock-pack --thin --check-self-contained-and-connected --cloning --no-progress https://github.com/istio/istio.git/
15:54:12.704064 git.c:439               trace: built-in: git fetch-pack --stateless-rpc --stdin --lock-pack --thin --check-self-contained-and-connected --cloning --no-progress https://github.com/istio/istio.git/

fatal: remote error: upload-pack: not our ref 39c3dbcd9b9c18c21ea97898922749528bbf96fe

Note: istio/istio@39c3dbc follows the same pattern as above.

@clarketm
Copy link
Member

clarketm commented Jan 31, 2020

@stevekuznetsov @petr-muller

I see both openshift (https://github.com/openshift/release/issues/6110) and prow (kubernetes/test-infra#15225) have been experiencing pain related to this error remote error: upload-pack: not our ref ... in the recent past (~2-3 months). I am curious if anyone has found a solution, workaround, or cause (short of increasing retries)?

@stevekuznetsov
Copy link

I wonder if that merge commit is somehow related to in-repo config fetching?

The upload-pack: not our ref error is in general just related to GitHub's REST API racing with what's available over git from their remotes. The workaround in most cases is just to try again, the most common case is when the REST API/webhooks tell us about a refspec that's not pullable (yet) from the git servers.

@clarketm
Copy link
Member

clarketm commented Jan 31, 2020

I wonder if that merge commit is somehow related to in-repo c fetching?

I am still unsure about the merge commit origination. I will need to dig into the in-repo code to see.

The upload-pack: not our ref error is in general just related to GitHub's REST API racing with what's available over git from their remotes. The workaround in most cases is just to try again, the most common case is when the REST API/webhooks tell us about a refspec that's not pullable (yet) from the git servers.

That makes sense. So, it sounds like this is less of a bug and more of a necessary side-effect of eventual consistency and replication. I am fine with more retries or a longer backoff factor; the problem is that these are hardcoded to 3 retries with a relatively short backoff factor of 2 in the git client. This is apparently not enough since the above missed cherrypicks only happen after all retries are exhausted.

I guess a good first step is to try to identify, with certainty, what/who creates the merge commit(s). The I can tweak the retryCmd in k8s/test-infra to make the retry count and/or backoff factor configurable - and increase the quantity for these values.

@cjwagner
Copy link
Member

Still sounds like there is a bug somewhere given that we still don't know why the unrelated merge commit is being fetched. +1 for identifying that first before trying to work around with retries.

Since this looks like a git/GitHub bug and since you have a simple script to reproduce this it is probably time to escalate to GH support?

@istio-policy-bot istio-policy-bot added the lifecycle/needs-triage Indicates a new PR or issue needs to be triaged label Jan 31, 2020
@clarketm
Copy link
Member

clarketm commented Feb 1, 2020

Still sounds like there is a bug somewhere given that we still don't know why the unrelated merge commit is being fetched. +1 for identifying that first before trying to work around with retries.

Since this looks like a git/GitHub bug and since you have a simple script to reproduce this it is probably time to escalate to GH support?

GitHub support ticket created. In the meantime, I will add support for more (or longer time between) retries to see if that mitigates.

@alvaroaleman
Copy link

The git fetch error is emitted in the git clients Clone which doesn't get any kind of revision passed in.

The cherrypicker fetches the patch from the GitHub api and not via the git client: https://github.com/kubernetes/test-infra/blob/b2d632960440e47f671dd3b997709aa299911456/prow/external-plugins/cherrypicker/server.go#L378

This doesn't really answer the question why exactly this happens (maybe revisions that are not actually existing anymore are advertised by github?) but I doesn't look like a bug in our code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: P1
Development

No branches or pull requests

8 participants