-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Update cron's JSON format #1001
Conversation
"score": 0, | ||
"reason": "min result reason", | ||
"name": "Check-Name", | ||
"documentation": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These will be very repetitive in the BQ table. Is there any benefit to including this in every single row, as opposed to asking users to get this info from elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, I agree and thought about this too. BQ table supports column compression (Run Length Encoding, etc), so this should not take much space on disk. However, I just looked up pricing and we're charged based on data size before compression, so thats not ideal.
The main reason I included it is ease for users (in particular tools that script the results) to retrieve and for us to keep scorecard changes in sync with where the URL lives. Deps.dev already broke because the doc location changed.
Alternatives:
- Create a url per json instead of per check. That still requires us to keep url#check-name non-breaking. This would break if we separate checks in their own doc file (as suggested in Include URL for more information #998).
- URL that redirects to doc...
What are better options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@azeemsgoogle any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to launch Scorecard V2 (version with scores)? Given our conversations with ProjectsByIf, it is likely that our result formats will undergo massive overhaul soon. Do we gain anything by launching this version then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It helps users of deps.dev get a better format. I don't know how much traffic they get, but I think they are a source of discovery for scorecard we should not neglect. @kimsterv @inferno-chromium do you know?
David also expressed interest in having score-based format in OSSF dashboard.
In terms of projectsbyif, if we need to change scorecard/format a lot, then this means it will take quite some time to land the changes. In the meantime it's useful to have deps.dev use a score-based version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow-up about pricing. https://cloud.google.com/bigquery/pricing says, under storage section: $0.020 per GB
. If we reach 1M repos, considering a 300
characters/bytes per doc (url + description), given 10
checks per run, and with 4
cron runs per month, we get:
$0.020 * 300 * 10 * 1000000 * 4 / 1000000000 = $0.24/month.
Please chime in if I forgot something in the calculation above. If the calculation is correct, pricing is not an issue/blocker.
Given the use of compression by BQ, latency should not be an issue either.
So I think we can keep the doc fields per check in JSON.
@oliverchang waiting for ur feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is making deps.dev and OSSF metrics teams go through 2 different migrations. There might also be other teams (like Project Thoth - #978 (comment)) who are relying on the BQ data without us knowing about it. It might be worth figuring out a plan which minimizes the overhead on our clients for doing these migrations.
re: pricing/storage - IMO its fine to have this data, even though it maybe redundant. For teams relying on our BQ data, it'll be useful to have this present in BQ itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining. Ease of use + link stability seems like a good enough reason to keep this in the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: other teams relying on BQ dataset. I think we need to announce breaking changes/new versions on a dedicated channel. Maybe scorecard-announce@ mailing list or a dedicated section on pkg.go.dev, probably both. We cannot, ourselves, keep an inventory of who is using scorecard. I've added an item for the next scorecard sync.
Note that updating the cron JSON does not force our users to use it. We can still support the old format for 6 months or more. That should be part of some SLO we need to write for each release, e.g., that we will keep providing support for at least X months after releasing a breaking change version. This will allows teams to make an informed decision whether they want to upgrade or not. Teams should decide if they want to upgrade, we cannot do that for them unequivocally, especially as the number of teams will increase and their requirements differ.
Let's discuss more during the next scorecard sync. That's really important
some error
|
https://github.com/ossf/scorecard/blob/main/.github/workflows/main.yml#L23 this was fixed by @azeemshaikh38 and I was hoping this would fix the issue |
for some reason
for some reason this is now working. The problem seems to be happening sometimes only |
What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
feature
What is the current behavior? (You can also link to an open issue here)
older json
What is the new behavior (if this is a feature change)?
same json format as scorecard itself. It's a mere copy of files
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
yes
Other information: