-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] ”Generate Builder“ step fails #942
Comments
Hi! Thanks for the report: we've seen this and are trying to fix this all the way through. This is the same issue that @stephenfuqua reported: #876 (comment) In the meantime, enable What we know has happened is that Rekor (the transparency log that the builders provenance is recorded on) had breaking changes that broke our verification. Here's what happened in this particular case when we were verifying the builder provenance's inclusion in the log: we checked verified inclusion proof against the root hash, and then verified the root hash against a current signed tree head. After Rekor sharded (rotated trees) in production, the tree head no longer corresponds to the same tree that the entry was in, which caused a verification error. In particular, the tree size given in the entry's inclusion proof is LARGER than the current tree size after rotation. This will fix this slsa-framework/slsa-verifier#277, and we'll backport the changes to the |
Arg! Good question: I think we purposely omitted it from documentation, since we mostly rely on it to test builders at head that don't have provenance (this option builds from source; and so it will increase the length of time to run a provenance generation) This is a reuseable workflow input argument (so
|
sounds safe enough. Let's do that. |
It was Furthermore, we’ll refactor our own release actions such that a failure in the SLSA tools won’t break our own workflows (like it did this time). Which brings me to an important question: would you consider these SLSA workflows stable, or not yet? What’s the current roadmap? |
The workflows themselves are stable but we rely on the public Rekor instance as the transparency log for now and we have unintentionally been the guinea pig/canary for a few issues recently (issues with public server discovered by this project). We may support folks using their own Rekor instance as part of #34 but we haven't prioritized that on the roadmap yet. @asraa Are there any docs on the current support level/SLA for the public Rekor and/or any docs on future plans wrt SLAs that we can point @jenstroeger to? |
Yes -- good question: like Ian said we rely on Sigstore's public instance for the transparency log. It's projected to go fully GA in mid-October, which is why we're seeing a lot of last breaking changes as the API stabilizes. At GA, there will be an SLA for Rekor: https://docs.google.com/document/d/1lhcnNGA9yuNSt0W72fEiCPhLN4M6frR_eUolycLIc8w/edit#heading=h.nq2r70ogoyyl (you may need to join [email protected]). The highlights are that there's 99.9% availability for Rekor's endpoint. With regards to the types of errors we saw here (where there were incompatible changes with client code): I expect that after GA Sigstore will not be breaking existing clients. |
The Is there going to be a patch version for the |
Yes, we'll cut a patch soon once we've fixed some e2e tests that are failing. |
Thanks @ianlewis. So, that would also mean It's an interesting case because the public Rekor instance hasn't/can't be pinned as a dependency and if there is a breaking change there, we can't really rely on semantic versioning to decide what needs to be updated. I would have expected a pinned version of |
This also broke the Can I get a rough ETA on when the new version of the action will be cut? Trying to decide if I should roll back provenance generation to get the release out or just wait a bit. |
IIUC yes that's right.
Yes. Part of the reason is that the public Rekor itself isn't GA but we are relying on it. Unfortunately Rekor is making changes that break clients and will likely continue to do so until it's GA. I added #958 and #959 to add docs on our use of Rekor and to better handle transient errors.
I'm not sure I can commit to anything personally but I'll see if we can get a release out by EOW. |
Just wanted to get a sense of whether it was n(hours, days, weeks). But I ended up reverting, so no pressure at all from me! |
I think this conversation illustrates the importance and difficulty of deep supply chain integrity. My workflow failed because the SLSA provenance generator failed because the Rekor service failed… And the above discussions raise a few more points that should be considered:
@ianlewis I think GA is no safeguard against breakage. I’d argue that this SLSA provenance generator should be tolerant to Rekor failures beyond GA. For example @asraa’s proposed fix above uses an undocumented feature, which should probably be documented for users because it decouples from yet another level of dependencies. Further still, it may even make sense to make |
I agree. We haven't documented well the guarantees that are being provided by using the workflow including dependencies of the workflow like Rekor.
Currently it doesn't Rekor isn't providing a lot of API guarantees to clients like us. We are currently working on organizing and providing feedback to the sigstore project regarding this stability.
Rekor deploys new versions of the public server independently of slsa-github-generator. The sigstore team doesn't currently have a way to notify clients like us when there are breaking changes. Long term, I don't think it's feasible for all Rekor clients to coordinate and upgrade every time there is a new version of the server.
Part of the reason why it's not documented is because it's not really intended to be used under normal circumstances because it lowers the security provided by the workflow. This is because it allows building from unreleased/untested versions of the workflow (e.g. from HEAD). We only have the input there at all in order to support our pre-submit tests. @asraa suggested it because it's an escape hatch that can unblock users temporarily but it's really an internal API. While we would like to be resilient to Rekor failures we do need it to create records in the transparency log so that it can be used during verification. #34, #958, #959 are issues we are tracking currently to allow users to mitigate the issue but I think that the core problem won't ultimately be solved without better API guarantees for Rekor's API.
While it avoids a dependency on Rekor when verifying the builder binaries used by the workflow, it doesn't completely remove the dependency on Rekor because it still needs to be used to create new entries in the transparency log. As mentioned above it also is a security trade off which I don't really want to recommend to users. |
As a new feature, how about allowing building from source with the correct version? Would there still be any security concerns? |
Is there some way to follow along to be notified when this is fixed/released? A different issue that hasn't already been closed, maybe? Edit: Is it #968? |
Two things we've prioritized on this front for the next quarter:
While providing a security tool though, we also did require the most robust security verification (with an online verification from Rekor directly) like Ian mentioned. I can see the argument both ways though, and reducing its reliance on an online service or transparency log is helpful for the future.
@ianlewis @laurentsimon can we support a build from the tag checkout using a secure-checkout that ensures that tag resolves to a safe/attested SHA? This is similar to @behnazh-w suggestion.
Yes, I'll add a note on release of v1.3.1 in that title. |
Looks like people keep running into this problem (and post on the SLSA Slack workspace) which hasn’t actually been resolved yet. To also address @di’s request above — could we please reopen this issue until it’s actually been addressed? |
We can keep it open until we've done the release. I plan on going through the release process today. To make a long story short the release process we have is complicated and has a number of unrelated issues which blocked us for a while. |
+1 to this solution. Some users have complained about the time it takes to build if compile-builder is set to true by default.
Should be do-able. |
I see https://github.com/slsa-framework/slsa-github-generator/releases/tag/v1.2.1 exists, should this be closed? |
Let's close |
@jenstroeger We finally made a release with the fix for this. Please use v1.2.1. |
Describe the bug
It looks like the Generate Builder step fails:
when invoked as a job similar to this public example but in a private repo:
To Reproduce
Restarting the failed job continues to fail. Alas, private repo.
Expected behavior
This has worked before and during today’s run failed.
Screenshots
See above.
Additional context
The text was updated successfully, but these errors were encountered: