Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Update cron's JSON format #1001

Merged
merged 8 commits into from
Sep 13, 2021
Merged

Conversation

laurentsimon
Copy link
Contributor

@laurentsimon laurentsimon commented Sep 10, 2021

  • Please check if the PR fulfills these requirements
  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
    feature

  • What is the current behavior? (You can also link to an open issue here)
    older json

  • What is the new behavior (if this is a feature change)?
    same json format as scorecard itself. It's a mere copy of files

  • Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
    yes

  • Other information:

"score": 0,
"reason": "min result reason",
"name": "Check-Name",
"documentation": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will be very repetitive in the BQ table. Is there any benefit to including this in every single row, as opposed to asking users to get this info from elsewhere?

Copy link
Contributor Author

@laurentsimon laurentsimon Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I agree and thought about this too. BQ table supports column compression (Run Length Encoding, etc), so this should not take much space on disk. However, I just looked up pricing and we're charged based on data size before compression, so thats not ideal.
The main reason I included it is ease for users (in particular tools that script the results) to retrieve and for us to keep scorecard changes in sync with where the URL lives. Deps.dev already broke because the doc location changed.

Alternatives:

  1. Create a url per json instead of per check. That still requires us to keep url#check-name non-breaking. This would break if we separate checks in their own doc file (as suggested in Include URL for more information #998).
  2. URL that redirects to doc...

What are better options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azeemsgoogle any thoughts on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to launch Scorecard V2 (version with scores)? Given our conversations with ProjectsByIf, it is likely that our result formats will undergo massive overhaul soon. Do we gain anything by launching this version then?

Copy link
Contributor Author

@laurentsimon laurentsimon Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It helps users of deps.dev get a better format. I don't know how much traffic they get, but I think they are a source of discovery for scorecard we should not neglect. @kimsterv @inferno-chromium do you know?

David also expressed interest in having score-based format in OSSF dashboard.

In terms of projectsbyif, if we need to change scorecard/format a lot, then this means it will take quite some time to land the changes. In the meantime it's useful to have deps.dev use a score-based version.

Copy link
Contributor Author

@laurentsimon laurentsimon Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up about pricing. https://cloud.google.com/bigquery/pricing says, under storage section: $0.020 per GB. If we reach 1M repos, considering a 300 characters/bytes per doc (url + description), given 10 checks per run, and with 4 cron runs per month, we get:
$0.020 * 300 * 10 * 1000000 * 4 / 1000000000 = $0.24/month.

Please chime in if I forgot something in the calculation above. If the calculation is correct, pricing is not an issue/blocker.

Given the use of compression by BQ, latency should not be an issue either.

So I think we can keep the doc fields per check in JSON.

@oliverchang waiting for ur feedback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is making deps.dev and OSSF metrics teams go through 2 different migrations. There might also be other teams (like Project Thoth - #978 (comment)) who are relying on the BQ data without us knowing about it. It might be worth figuring out a plan which minimizes the overhead on our clients for doing these migrations.

re: pricing/storage - IMO its fine to have this data, even though it maybe redundant. For teams relying on our BQ data, it'll be useful to have this present in BQ itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. Ease of use + link stability seems like a good enough reason to keep this in the table.

Copy link
Contributor Author

@laurentsimon laurentsimon Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: other teams relying on BQ dataset. I think we need to announce breaking changes/new versions on a dedicated channel. Maybe scorecard-announce@ mailing list or a dedicated section on pkg.go.dev, probably both. We cannot, ourselves, keep an inventory of who is using scorecard. I've added an item for the next scorecard sync.

Note that updating the cron JSON does not force our users to use it. We can still support the old format for 6 months or more. That should be part of some SLO we need to write for each release, e.g., that we will keep providing support for at least X months after releasing a breaking change version. This will allows teams to make an informed decision whether they want to upgrade or not. Teams should decide if they want to upgrade, we cannot do that for them unequivocally, especially as the number of teams will increase and their requirements differ.

Let's discuss more during the next scorecard sync. That's really important

@laurentsimon
Copy link
Contributor Author

some error

protoc --go_out=../../../ cron/data/metadata.proto
/bin/bash: protoc: command not found
make: *** [Makefile:94: cron/data/metadata.pb.go] Error 127

@naveensrinivasan

@naveensrinivasan
Copy link
Member

some error

protoc --go_out=../../../ cron/data/metadata.proto
/bin/bash: protoc: command not found
make: *** [Makefile:94: cron/data/metadata.pb.go] Error 127

@naveensrinivasan

https://github.com/ossf/scorecard/blob/main/.github/workflows/main.yml#L23 this was fixed by @azeemshaikh38 and I was hoping this would fix the issue

@laurentsimon
Copy link
Contributor Author

some error

protoc --go_out=../../../ cron/data/metadata.proto
/bin/bash: protoc: command not found
make: *** [Makefile:94: cron/data/metadata.pb.go] Error 127

@naveensrinivasan

for some reason

some error

protoc --go_out=../../../ cron/data/metadata.proto
/bin/bash: protoc: command not found
make: *** [Makefile:94: cron/data/metadata.pb.go] Error 127

@naveensrinivasan

https://github.com/ossf/scorecard/blob/main/.github/workflows/main.yml#L23 this was fixed by @azeemshaikh38 and I was hoping this would fix the issue

for some reason this is now working. The problem seems to be happening sometimes only

@laurentsimon laurentsimon enabled auto-merge (squash) September 13, 2021 21:44
@laurentsimon laurentsimon merged commit 6178207 into ossf:main Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants