Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAT permissions issues for CD auto-commit for docs pages #1726

Closed
aj-stein-nist opened this issue Mar 27, 2023 · 10 comments · Fixed by #1731
Closed

PAT permissions issues for CD auto-commit for docs pages #1726

aj-stein-nist opened this issue Mar 27, 2023 · 10 comments · Fixed by #1731
Assignees
Labels
bug Scope: CI/CD Enhancements to the project's Continuous Integration and Continuous Delivery pipeline.

Comments

@aj-stein-nist
Copy link
Contributor

Describe the bug

The use of a GitHub PAT associated with the oscalbuilder, with only repo:public_repo permissions, is insufficient to complete the builds in GitHub Actions. On the final stage in different workflows, where updated model document and website content are committed back to the repo, the token lacks sufficient permissions and a git clone operation is denied.

Details as reported in a related issue can be found in #1698 (comment).

Who is the bug affecting

NIST OSCAL developer team wishing to release code, documentation, and website enhancements.

What is affected by this bug

CI/CD, OSCAL Content, Documentation, Website

How do we replicate this issue

  1. Set up a developer fork of the repo.
  2. Set up a PAT for the fork as explained in the screenshot in Intermittent HTTP 429 errors from git push in CI/CD when auto-committing docs and converted content #1698 (comment) and configure it for username/OSCAL fork.
  3. Run the generate website workflow from your fork.
  4. Review logs and see git clone failures on the generate website workflow, even without the presence of 429 errors.

Expected behavior (i.e. solution)

The git clone operations are successfully completed without error, commits are made, and the the committed changes are pushed back to the repo so the website can be updated (this is how pages.nist.gov works).

Other comments

No response

@aj-stein-nist aj-stein-nist added bug Scope: CI/CD Enhancements to the project's Continuous Integration and Continuous Delivery pipeline. labels Mar 27, 2023
@aj-stein-nist aj-stein-nist self-assigned this Mar 27, 2023
aj-stein-nist added a commit to aj-stein-nist/OSCAL-forked that referenced this issue Mar 27, 2023
aj-stein-nist added a commit to aj-stein-nist/OSCAL-forked that referenced this issue Mar 27, 2023
aj-stein-nist added a commit to aj-stein-nist/OSCAL-forked that referenced this issue Mar 27, 2023
aj-stein-nist added a commit to aj-stein-nist/OSCAL-forked that referenced this issue Mar 27, 2023
@aj-stein-nist
Copy link
Contributor Author

Now I am more confused, it seems that the repo:public_repo scope is sufficient on my fork, I guess I need to go back and determine why that is not the case on usnistgov/OSCAL, if at all?

@aj-stein-nist
Copy link
Contributor Author

Now I am wondering if something about regenerating the token makes a difference, because lo and behold, even running it on the official repo, not a fork, is working without failing with simple repo:public_repo permissions, but I might have to make a minor change to the website to test this more thoroughly. Our opportunity may come in the form of merging #1662 later in this sprint as we come to a close, so I will test a little more and leave this under review until further notice.

Also, I will add a SOP page to the wiki section around debugging and troubleshooting this, as it seems very much necessary for others in this position moving forward.

@aj-stein-nist aj-stein-nist changed the title PAT scopes insufficient for CD auto-commit for docs pages PAT permissions issues for CD auto-commit for docs pages Mar 29, 2023
@aj-stein-nist
Copy link
Contributor Author

OK, a workflow run on an unrelated branch on developer after merging a fresh PR with the updated token failed once then worked. The updated token helped, but also the oscalbuilder account associated with the token used for committing needs the Maintain permissions on the usnistgov/OSCAL repo to bypass branch protections for direct commits to the protected branch. (It was already set up to be allowed to edit that branch, but it does not seem to be enough, see screenshot below.)

image

@aj-stein-nist
Copy link
Contributor Author

Well, that didn't go as planned, need to examine this further ... again, sigh.

@aj-stein-nist aj-stein-nist reopened this Mar 30, 2023
@aj-stein-nist
Copy link
Contributor Author

Previous debugging approaches and a prior solution didn't seem to change it anymore. I guess I will have to dig some more, but I am definitely opening a GH support ticket because the amount of effort I expended on this is now very excessive. I will test further more with the 429 header check feature branch of Chris but testing off my fork and a complete clone that is not fork connected does not afford the same granularity of branch protections, so I cannot completely test this with oscalbuilder on aj-stein-nist/OSCAL either (as opposed to my fork I used for a year or more now, relocated to aj-stein-nist/OSCAL-forked). Either way the subset of permissions on a personal fork, as opposed to a repo, is still problematic.

More to follow.

aj-stein-nist added a commit to aj-stein-nist/OSCAL that referenced this issue Mar 30, 2023
@aj-stein-nist
Copy link
Contributor Author

Created ticket #2085480 with GitHub for a consult.

@aj-stein-nist
Copy link
Contributor Author

Been working on this again in the morning, and using the nice work in #1698 and mimicking it locally on my workstation, oscalbuilder is way under the rate limit (even run starting fresh this morning from 60, it goes down once per run, 59, 58, and is not even close to hitting the limit). So I guess need to mimic using the token locally it seems to figure what gives here.

@aj-stein-nist
Copy link
Contributor Author

OK, first run with a test repo was successful.

https://github.com/ImportantFederalAgencyTest/OSCAL/actions/runs/4575711091

Now need to set up branch protections, but that also means I need to make it public, or buy an enterprise org. I will opt for the former. :-)

@aj-stein-nist
Copy link
Contributor Author

aj-stein-nist commented Mar 31, 2023

That didn't take long, we are not back to where we need to be.

https://github.com/ImportantFederalAgencyTest/OSCAL/actions/runs/4576185322/jobs/8080201209

After this failed build, I checked my rate limits and it seems far below what is necessary, so this 429 is still confusing:

me@computer:~/code/oscal/branches/me/main$ curl -s -I -H "Accept: application/vnd.github+json" https://api.github.com/users/oscalbuilder | grep -E "x-ratelimit-remaining|x-ratelimit-limit|x-ratelimit-used|^HTTP"
HTTP/2 200 
x-ratelimit-limit: 60
x-ratelimit-remaining: 45
x-ratelimit-used: 15
me@computer:~/code/oscal/branches/me/main$ curl -s -I -H "Authorization: token ${GH_PAT}" -H "Accept: application/vnd.github+json" https://api.github.com/users/oscalbuilder | grep -E "x-ratelimit-remaining|x-ratelimit-limit|x-ratelimit-used|^HTTP"
HTTP/2 200 
x-ratelimit-limit: 5000
x-ratelimit-remaining: 4998
x-ratelimit-used: 2

@aj-stein-nist
Copy link
Contributor Author

OK, @Compton-NIST and @nikitawootten-nist were very helpful. I learned some things about an argument added by yours truly in b08851c. It appears what we are passing to the peaceiris/actions-gh-pages incorrectly so it was operating with the GITHUB_TOKEN not the PAT after all.

https://github.com/peaceiris/actions-gh-pages#%EF%B8%8F-set-runners-access-token-github_token

So next steps to integrate into OSCAL main with a PR from yours truly.

aj-stein-nist added a commit that referenced this issue Mar 31, 2023
By doing this, we correctly the PAT usage and not ironically use an existing,
but improperly permissioned GITHUB_TOKEN provided as a context machine identity
for all runs of all workflows, this should fix the builds and stop the cryptic
HTTP 429 rate limit error response. It's cryptic because you get a 429 response
after one single API operation (with git clone) because the token is wrong.
aj-stein-nist added a commit that referenced this issue Mar 31, 2023
As part of the troubleshooting work, GH docs do indicate scanning links from the GHA
runners can potentially cause rate limiting. We have automated nightly scans and we
review code changes as part of PRs. We can forgot commit-by-commit link scanning as
a short-to-medium term mitigation and enable it again later.
aj-stein-nist added a commit that referenced this issue Mar 31, 2023
By doing this, we correctly the PAT usage and not ironically use an existing,
but improperly permissioned GITHUB_TOKEN provided as a context machine identity
for all runs of all workflows, this should fix the builds and stop the cryptic
HTTP 429 rate limit error response. It's cryptic because you get a 429 response
after one single API operation (with git clone) because the token is wrong.
aj-stein-nist added a commit that referenced this issue Mar 31, 2023
As part of the troubleshooting work, GH docs do indicate scanning links from the GHA
runners can potentially cause rate limiting. We have automated nightly scans and we
review code changes as part of PRs. We can forgot commit-by-commit link scanning as
a short-to-medium term mitigation and enable it again later.
aj-stein-nist added a commit that referenced this issue Apr 3, 2023
…1726) (#1731)

* Correct GHA action-gh-pages argument for #1726

By doing this, we correctly the PAT usage and not ironically use an existing,
but improperly permissioned GITHUB_TOKEN provided as a context machine identity
for all runs of all workflows, this should fix the builds and stop the cryptic
HTTP 429 rate limit error response. It's cryptic because you get a 429 response
after one single API operation (with git clone) because the token is wrong.

* Remove lychee scans during site build for in #1726

As part of the troubleshooting work, GH docs do indicate scanning links from the GHA
runners can potentially cause rate limiting. We have automated nightly scans and we
review code changes as part of PRs. We can forgot commit-by-commit link scanning as
a short-to-medium term mitigation and enable it again later.

* Add @Compton-NIST's rate limit checks from #1698 for #1726.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Scope: CI/CD Enhancements to the project's Continuous Integration and Continuous Delivery pipeline.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant