Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-39339: gather: simplify service regex for analyze #9073

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

r4f4
Copy link
Contributor

@r4f4 r4f4 commented Oct 4, 2024

By running openshift-install gather bootstrap --dir path/to/workdir outside of the workdir, the logs from the bootstrap VM are saved with log-bundle-XXXX/ prefix path but compared against the full path path/to/workdir during path substitution for the final log bundle archive:

DEBUG Log bundle written to /var/home/core/log-bundle-20241004145703.tar.gz
DEBUG combinedDirectory: c/log-bundle-20241004145703
[...]
DEBUG oldHeaderName: log-bundle-20241004145703/bootstrap/services/approve-csr.json
DEBUG replacing '/c/cluster-log-bundle-20241004145703/' by '' in log-bundle-20241004145703/bootstrap/services/approve-csr.json
DEBUG newHeaderName: log-bundle-20241004145703/bootstrap/services/approve-csr.json
DEBUG no log-bundle-XXXX prefix, adding: c/log-bundle-20241004145703/log-bundle-20241004145703/bootstrap/services/approve-csr.json

This causes the service regex from the analyze command to not match any service files and we get the error

ERROR Invalid log bundle or the bootstrap machine could not be reached and bootstrap logs were not collected.

even though the bootstrap logs were successfully collected.

There are 3 possible fixes for this issue:

  1. Change the collector script to save bootstrap logs in the bootstrap VM using the same path as specified to the gather bootstrap command;
  2. When pulling the logs from the bootstrap VM, rename/move all the files to the path specified to gather bootstrap;
  3. Change the service regex to ignore the path prefix.

I have opted to implement 3, since it involves the fewest changes and it's unlikely to introduce serious regressions: at worst it will make the analyze fail but the log collection won't be affected.

By running `openshift-install gather bootstrap --dir path/to/workdir`
outside of the `workdir`, the logs from the bootstrap VM are saved with
`log-bundle-XXXX/` prefix path but compared against the full path
`path/to/workdir` during path substitution for the final log bundle
archive:
```
DEBUG Log bundle written to /var/home/core/log-bundle-20241004145703.tar.gz
DEBUG combinedDirectory: c/log-bundle-20241004145703
[...]
DEBUG oldHeaderName: log-bundle-20241004145703/bootstrap/services/approve-csr.json
DEBUG replacing '/c/cluster-log-bundle-20241004145703/' by '' in log-bundle-20241004145703/bootstrap/services/approve-csr.json
DEBUG newHeaderName: log-bundle-20241004145703/bootstrap/services/approve-csr.json
DEBUG no log-bundle-XXXX prefix, adding: c/log-bundle-20241004145703/log-bundle-20241004145703/bootstrap/services/approve-csr.json
```
This causes the service regex from the analyze command to not match any
service files and we get the error
```
ERROR Invalid log bundle or the bootstrap machine could not be reached and bootstrap logs were not collected.
```
even though the bootstrap logs were successfully collected.

There are 3 possible fixes for this issue:
1. Change the collector script to save bootstrap logs in the bootstrap
   VM using the same path as specified to the `gather bootstrap`
   command;
2. When pulling the logs from the bootstrap VM, rename/move all the
   files to the path specified to `gather bootstrap`;
3. Change the service regex to ignore the path prefix.

I have opted to implement 3, since it involves the fewest changes and
it's unlikely to introduce serious regressions: at worst it will make
the analyze fail but the log collection won't be affected.
@openshift-ci-robot openshift-ci-robot added jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 4, 2024
@openshift-ci-robot
Copy link
Contributor

@r4f4: This pull request references Jira Issue OCPBUGS-39339, which is invalid:

  • expected the bug to target the "4.18.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

By running openshift-install gather bootstrap --dir path/to/workdir outside of the workdir, the logs from the bootstrap VM are saved with log-bundle-XXXX/ prefix path but compared against the full path path/to/workdir during path substitution for the final log bundle archive:

DEBUG Log bundle written to /var/home/core/log-bundle-20241004145703.tar.gz
DEBUG combinedDirectory: c/log-bundle-20241004145703
[...]
DEBUG oldHeaderName: log-bundle-20241004145703/bootstrap/services/approve-csr.json
DEBUG replacing '/c/cluster-log-bundle-20241004145703/' by '' in log-bundle-20241004145703/bootstrap/services/approve-csr.json
DEBUG newHeaderName: log-bundle-20241004145703/bootstrap/services/approve-csr.json
DEBUG no log-bundle-XXXX prefix, adding: c/log-bundle-20241004145703/log-bundle-20241004145703/bootstrap/services/approve-csr.json

This causes the service regex from the analyze command to not match any service files and we get the error

ERROR Invalid log bundle or the bootstrap machine could not be reached and bootstrap logs were not collected.

even though the bootstrap logs were successfully collected.

There are 3 possible fixes for this issue:

  1. Change the collector script to save bootstrap logs in the bootstrap VM using the same path as specified to the gather bootstrap command;
  2. When pulling the logs from the bootstrap VM, rename/move all the files to the path specified to gather bootstrap;
  3. Change the service regex to ignore the path prefix.

I have opted to implement 3, since it involves the fewest changes and it's unlikely to introduce serious regressions: at worst it will make the analyze fail but the log collection won't be affected.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@r4f4
Copy link
Contributor Author

r4f4 commented Oct 4, 2024

/hold
Until I drop commit 8c9eb78

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 4, 2024
@r4f4
Copy link
Contributor Author

r4f4 commented Oct 4, 2024

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 4, 2024
@openshift-ci-robot
Copy link
Contributor

@r4f4: This pull request references Jira Issue OCPBUGS-39339, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.0) matches configured target version for branch (4.18.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jinyunma

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from jinyunma October 4, 2024 18:50
@patrickdillon
Copy link
Contributor

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2024
Copy link
Contributor

openshift-ci bot commented Oct 4, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 4, 2024
@r4f4
Copy link
Contributor Author

r4f4 commented Oct 4, 2024

e2e-aws-ovn: gather bootstrap is still working as expected

time="2024-10-04T20:48:58Z" level=debug msg="Log bundle written to /var/home/core/log-bundle-20241004204852.tar.gz"
time="2024-10-04T20:49:00Z" level=error msg="Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get \"https://api.ci-op-28tv02km-3266f.origin-ci-int-aws.dev.rhcloud.com:6443/apis/config.openshift.io/v1/clusteroperators\": dial tcp 54.245.87.64:6443: connect: connection refused"
time="2024-10-04T20:49:00Z" level=error msg="Bootstrap failed to complete: Get \"https://api.ci-op-28tv02km-3266f.origin-ci-int-aws.dev.rhcloud.com:6443/version\": dial tcp 54.245.87.64:6443: connect: connection refused"
time="2024-10-04T20:49:00Z" level=error msg="Failed waiting for Kubernetes API. This error usually happens when there is a problem on the bootstrap host that prevents creating a temporary control plane."
time="2024-10-04T20:49:00Z" level=info msg="Flag --icsp-file has been deprecated, support for it will be removed in a future release. Use --idms-file instead."
time="2024-10-04T20:49:00Z" level=info msg="Rendering api manifests..."
time="2024-10-04T20:49:00Z" level=info msg="Force failing"
time="2024-10-04T20:49:00Z" level=info msg="Bootstrap gather logs captured here \"/tmp/installer/log-bundle-20241004204852.tar.gz\""

@r4f4 r4f4 force-pushed the gather-bootstrap-path-fix branch from 8c9eb78 to ca029f9 Compare October 4, 2024 21:29
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2024
Copy link
Contributor

openshift-ci bot commented Oct 4, 2024

New changes are detected. LGTM label has been removed.

@r4f4
Copy link
Contributor Author

r4f4 commented Oct 4, 2024

/hold cancel
Dropped the DNM commit.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 4, 2024
Copy link
Contributor

openshift-ci bot commented Oct 5, 2024

@r4f4: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants