feat(noSecrets): refine the entropy computation to avoid some false positives #4118

SaadBazaz · 2024-09-29T14:29:09Z

Further improves #3823

Research

Ever since the creation of this biome plugin originally, I've found various other secret scanning solutions:

The OG, gitleaks (Go): https://github.com/gitleaks/gitleaks/
Trufflehog (Go): https://github.com/trufflesecurity/trufflehog
Sensleak (Rust): https://github.com/crates-pro/sensleak-rs

I've been wondering; Can we consume either one of the above to use in our usecase? Do we really need to add another to the ecosystem?
This I'd like to ask the authors here.

In the meanwhile...

Fixes

Fix original eslint attribution
Add better tests for generic use cases

New features planned

Legend:
FP = Feature-Parity with eslint no-secrets (features listed here)

I would love to hear strategies on how to achieve this.

…and user feedback

crates/biome_js_analyze/src/lint/nursery/no_secrets.rs

SaadBazaz · 2024-09-29T14:52:13Z

@minht11 @Conaclos @dyc3 - Would appreciate if you can read the PR description and drop your thoughts.

dyc3 · 2024-09-29T15:07:42Z

I'll defer to the other maintainers for their opinions on adding a dependency for this.

For the proposed options: We generally try to avoid adding options for the sake of adding options. We need an appropriate amount of demand (from users) and justification to determine the correct granularity and scope of those options.

Can we consume either one of the above to use in our usecase? Do we really need to add another to the ecosystem?

Personally, I think a dedicated tool is probably going to have better heuristics than we will, at least in the short term. Plus, we don't take commit history into account like some of those tools do (and we shouldn't, that's not what biome is for).

I think this rule will at least be good for picking out the most egregious cases, with the added benefit that adding the rule to their existing config is easier than adding a new tool to their chain. We could also consider pointing users to a dedicated tool in the diagnostic message and rule documentation. Gitleaks seems to be the most mature.

minht11 · 2024-09-29T23:09:26Z

To add to what @dyc3 said, in my opinion Biome likely isn't a right tool to fully replace full repo secret scanning. Biome does not support all the languages which those tools do and need to be a proper security solution, we have stricter perf constraints and so on.

Having this rule in Biome does help development velocity, no need to wait for CI, and majority of users who wouldn't think about secrets get a warning for common cases, though we definitely should document about full solutions in case users need those.

As for integrations listed:
Sensleak - does not have rust crate, only CLI.
Otherss are GO solutions. Not sure how portable they are, can you call GO lib from rust reasonably? How would the performance be?

dyc3 · 2024-09-30T00:11:01Z

can you call GO lib from rust reasonably?

No. Turbopack had a whole zig helper library to interop between rust and golang while they were porting the project to rust. Definitely not worth the complexity.

SaadBazaz · 2024-09-30T06:04:36Z

For the proposed options: We generally try to avoid adding options for the sake of adding options. We need an appropriate amount of demand (from users) and justification to determine the correct granularity and scope of those options.

I agree!

There are some features which would make us feature-parity with eslint no-secrets, I've updated the description with a legend to reflect that.

We could also consider pointing users to a dedicated tool in the diagnostic message and rule documentation. Gitleaks seems to be the most mature.
... though we definitely should document about full solutions in case users need those.

Definitely going to do this!

Sensleak - does not have rust crate, only CLI.

I can port some code then 🤠 Or maybe ask their team to make a crate/API for reusability.

No. Turbopack had a whole zig helper library to interop between rust and golang while they were porting the project to rust. Definitely not worth the complexity.

Yikes.

ematipico · 2024-09-30T10:17:31Z

We usually add options only when required or requested, and when there are valid use cases to cover.

Not sure there's enough value to add yet another dependency for this rule. Sure, it's an important rule, but as the others said, we don't cover all languages. Our documentation could actually propose alternatives to the users.

Conaclos · 2024-09-30T10:31:16Z

I totally agree with others. The rule should find the most obvious secret leaks and avoid false positives because users might get annoyed and turn off the rule (that defeat its purpose).

We could add a disclaimer in the rule description saying that and point to relevant tools (as the one you cited) for advanced secret leak detections.

SaadBazaz · 2024-10-01T20:10:00Z

I totally agree with others. The rule should find the most obvious secret leaks and avoid false positives because users might get annoyed and turn off the rule (that defeat its purpose).

We could add a disclaimer in the rule description saying that and point to relevant tools (as the one you cited) for advanced secret leak detections.

In that case, (which I agree with), we should just do two tasks for now:

Add the containsSpace heuristic which will drastically reduce false positives
Update the docs and error message to suggest relevant tools

However I do suggest 3) adding the option to control entropy as it's not only easy to add, but also useful for us to learn what people are usually comfortable with so we can improve the default option.

What do you think?

Conaclos · 2024-10-01T21:27:33Z

However I do suggest 3) adding the option to control entropy as it's not only easy to add, but also useful for us to learn what people are usually comfortable with so we can improve the default option.

This looks fair enough to me.

SaadBazaz · 2024-10-02T07:59:48Z

@Conaclos - Noob question:

#[derive(Clone, Debug, Default, Deserialize, Deserializable, Eq, PartialEq, Serialize)]
#[cfg_attr(feature = "schema", derive(schemars::JsonSchema))]
#[serde(rename_all = "camelCase", deny_unknown_fields)]
pub struct NoSecretsOptions {
    /// Set entropy threshold (default is 4.5).
    entropy_threshold: f64,
}

How can I get f32 or f64 options (i.e. any floating point option) in Options?

Conaclos · 2024-10-02T08:21:13Z

How can I get f32 or f64 options (i.e. any floating point option) in Options?

ctx.options().entropy_threshold in the run/diagnostic methods.

SaadBazaz · 2024-10-02T15:27:35Z

@Conaclos - I'm getting the following error:

error[E0277]: the trait bound `f64: std::cmp::Eq` is not satisfied
   --> crates/biome_js_analyze/src/lint/nursery/no_secrets.rs:135:5
    |
130 | #[derive(Clone, Debug, Default, Deserialize, Deserializable, Eq, PartialEq, Serialize)]
    |                                                              -- in this derive macro expansion
...
135 |     entropy_threshold: f64, // @TODO: Doesn't work currently.
    |     ^^^^^^^^^^^^^^^^^^^^^^ the trait `std::cmp::Eq` is not implemented for `f64`
    |
    = help: the following other types implement trait `std::cmp::Eq`:
              i128
              i16
              i32
              i64
              i8
              isize
              u128
              u16
            and 4 others
note: required by a bound in `AssertParamIsEq`

Reading the docs:

The compiler should warn you that MyRuleOptions does not implement some required types. We currently require implementing serde's traits Deserialize/Serialize.

https://github.com/biomejs/biome/blob/main/crates/biome_analyze/CONTRIBUTING.md#:~:text=The%20compiler%20should%20warn%20you%20that%20MyRuleOptions%20does%20not%20implement%20some%20required%20types.%20We%20currently%20require%20implementing%20serde%27s%20traits%20Deserialize/Serialize.

It seems like f64 isn't supported in Options, am I missing out on something?

Conaclos · 2024-10-03T08:06:56Z

@SaadBazaz

f64 and f32 doesn't implement Eq/PartialEq because directly comparing floats is an error.
I am unsure why we need the options to be Eq/PartialEq.
I have to investigate if we can remove it.

For now, I see two possible approaches:

Implement ourselves PartialEq.
Usually we compare floats by taking the absolute value of their subtraction and comparing it as EPSILON.

impl Eq for MyOptions {}
impl PartialEq for MyOptions {
     fn eq(&self, other: &Self) -> bool {
         (self.entropy_threshold - other.entropy_threshold).abs() < f64:EPSILON
     }
}

Use an integer instead of float for the threshold.
For example, we could use a u16.
We could either divide the provided threshold by 10 (or even 100 if we need more precision: not sure if we need it) before comparing it against the computed threshold.
Or we could make the reverse: multiply by 10 (or 100) the computed threshold and truncate it before comparing it against the provided threshold.

Personnaly I could choose (2).

SaadBazaz · 2024-10-03T09:00:51Z

Personnaly I could choose (2).

Yeah I thought about (2) too, it'll allow us to have an abstraction for later on when/if we change the underlying entropy function. I'll go for that.

…-fast, comment some non-working tests

SaadBazaz · 2024-10-03T19:57:29Z

Updated entropy function to add a few more metrics which I thought might be useful (i.e. checking consecutive upper/lower cases, checking presence of numbers and symbols)
Had to comment out some tests :( I would love to cater them all but too many hyperparameters to balance. Need to go deeper into entropy or have better regexes.
Added entropy_threshold as Option.
Updated docs
Updated error message
Updated inspiration to original plugin

codspeed-hq · 2024-10-03T20:33:57Z

CodSpeed Performance Report

Merging #4118 will degrade performances by 8.01%

_{Comparing SaadBazaz:v2/no-secrets (d65ac68) with main (055b0db)}

Summary

❌ 1 regressions
✅ 104 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`SaadBazaz:v2/no-secrets`	Change
❌	`js_analyzer[lint_13640784270757307929.ts]`	30.3 ms	32.9 ms	-8.01%

SaadBazaz · 2024-10-04T21:13:04Z

@Conaclos @dyc3 Would love a review on the work so far.

dyc3

Just a quick review.

dyc3 · 2024-10-05T00:19:06Z

crates/biome_js_analyze/tests/specs/nursery/noSecrets/valid.js

+// TODO: Remove these false positives, they unfortunately hurt the user experience.
+// const NAMESPACE_CLASSNAME = 'Validation.JSONSchemaValidationUtilsImplFactory';
+// const BASE64_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
+// const webpackFriendlyConsole = require('./config/webpack/webpackFriendlyConsole');


We could easily exclude strings inside require() and import() calls, but I'm not blocking this PR for it.

Would that be by looking at the sibling / previous node in the AST?

The parent, but yeah.

We can do that in V3 alongside implementing JavaScript comments, what say? Because we haven't traversed the AST as of yet in this rule.

crates/biome_js_analyze/src/lint/nursery/no_secrets.rs

Conaclos · 2024-10-07T08:29:26Z

crates/biome_js_analyze/src/lint/nursery/no_secrets.rs

+For example, Continuous mixed cases (lIkE tHiS) are more likely to contribute to a higher score than single cases.
+Symbols also contribute highly to secrets.


Have you based this on existing works? This could be worth adding references.

Not really, I took assumptions and did some prompt engineering (attached in the ChatGPT chat). I'm considering reading some paper(s) on this topic to see if there's a better entropy function out in the wild. Do you have any recommendations for reads?

Unfortunately I have no expertise in that domain and I didn't take the time to read the literature. Have you tried the new version of the rule on some code bases?

I've added more test cases in valid.js, it was able to clear at least 1-2 more false positives (still leaves some leftover ones). Not a HUGE impact though.

I will see if I can find time to read through the literature, however, I might defer it to one of my company's team members if it takes too long.

crates/biome_analyze/src/rule.rs

SaadBazaz added 2 commits September 29, 2024 18:06

docs: update eslint inspiration

045319b

test: add more invalid and valid tests, specially from original code …

5d5477c

…and user feedback

github-actions bot added A-Linter Area: linter L-JavaScript Language: JavaScript and super languages labels Sep 29, 2024

dyc3 reviewed Sep 29, 2024

View reviewed changes

crates/biome_js_analyze/src/lint/nursery/no_secrets.rs Outdated Show resolved Hide resolved

chore: recommended false

02fb62d

SaadBazaz changed the title ~~V2/no-secrets~~ feat(no-secrets): V2 Sep 30, 2024

SaadBazaz mentioned this pull request Sep 30, 2024

Make a crate / programmatic interface crates-pro/sensleak-rs#32

Open

feat(wip): add temp option for entropy_threshold, doesn't work right now

70901ff

feat(wip): push latest code, but option doesn't work

e2a7a1e

SaadBazaz added 2 commits October 4, 2024 00:49

feat: add more in-grained entropy function, check heuristics for fail…

dcd2339

…-fast, comment some non-working tests

docs: improve error message

76249e2

SaadBazaz requested a review from dyc3 October 3, 2024 19:57

SaadBazaz added 2 commits October 4, 2024 00:58

chore: clean comment whitespace

65c9369

chore: clean stale comment

8adaa65

SaadBazaz changed the title ~~feat(no-secrets): V2~~ feat(linter): implement noSecrets V2 Oct 3, 2024

SaadBazaz added 2 commits October 4, 2024 01:05

chore: remove print statements

b5c4546

Merge branch 'main' into v2/no-secrets

bf66293

chore: format file

a17a69b

dyc3 reviewed Oct 5, 2024

View reviewed changes

Conaclos reviewed Oct 6, 2024

View reviewed changes

minht11 reviewed Oct 6, 2024

View reviewed changes

crates/biome_js_analyze/src/lint/nursery/no_secrets.rs Outdated Show resolved Hide resolved

crates/biome_js_analyze/src/lint/nursery/no_secrets.rs Outdated Show resolved Hide resolved

crates/biome_js_analyze/src/lint/nursery/no_secrets.rs Show resolved Hide resolved

SaadBazaz added 3 commits October 7, 2024 10:20

docs: make language easier, add sonarqube

304c171

docs: add list of secrets detected

d90fa11

docs: improve messages using markup

15d9399

Conaclos reviewed Oct 7, 2024

View reviewed changes

SaadBazaz and others added 2 commits October 7, 2024 16:49

chore: check for safe patterns

7f5d2e8

Merge branch 'main' into v2/no-secrets

f40fa20

Conaclos approved these changes Oct 7, 2024

View reviewed changes

Conaclos changed the title ~~feat(linter): implement noSecrets V2~~ feat(noSecrets): refine the entropy computation to avoid some false positives Oct 7, 2024

Conaclos reviewed Oct 7, 2024

View reviewed changes

crates/biome_analyze/src/rule.rs Outdated Show resolved Hide resolved

Apply suggestions from code review

d65ac68

Conaclos merged commit 731d00d into biomejs:main Oct 7, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(noSecrets): refine the entropy computation to avoid some false positives #4118

feat(noSecrets): refine the entropy computation to avoid some false positives #4118

SaadBazaz commented Sep 29, 2024 •

edited

Loading

SaadBazaz commented Sep 29, 2024

dyc3 commented Sep 29, 2024

minht11 commented Sep 29, 2024

dyc3 commented Sep 30, 2024

SaadBazaz commented Sep 30, 2024

ematipico commented Sep 30, 2024

Conaclos commented Sep 30, 2024

SaadBazaz commented Oct 1, 2024

Conaclos commented Oct 1, 2024

SaadBazaz commented Oct 2, 2024

Conaclos commented Oct 2, 2024

SaadBazaz commented Oct 2, 2024

Conaclos commented Oct 3, 2024

SaadBazaz commented Oct 3, 2024

SaadBazaz commented Oct 3, 2024

codspeed-hq bot commented Oct 3, 2024 •

edited

Loading

SaadBazaz commented Oct 4, 2024

dyc3 left a comment

dyc3 Oct 5, 2024 •

edited

Loading

SaadBazaz Oct 6, 2024

dyc3 Oct 6, 2024

SaadBazaz Oct 7, 2024 •

edited

Loading

Conaclos Oct 7, 2024

SaadBazaz Oct 7, 2024 •

edited

Loading

Conaclos Oct 7, 2024

SaadBazaz Oct 7, 2024

		For example, Continuous mixed cases (lIkE tHiS) are more likely to contribute to a higher score than single cases.
		Symbols also contribute highly to secrets.

feat(noSecrets): refine the entropy computation to avoid some false positives #4118

feat(noSecrets): refine the entropy computation to avoid some false positives #4118

Conversation

SaadBazaz commented Sep 29, 2024 • edited Loading

Research

Fixes

New features planned

SaadBazaz commented Sep 29, 2024

dyc3 commented Sep 29, 2024

minht11 commented Sep 29, 2024

dyc3 commented Sep 30, 2024

SaadBazaz commented Sep 30, 2024

ematipico commented Sep 30, 2024

Conaclos commented Sep 30, 2024

SaadBazaz commented Oct 1, 2024

Conaclos commented Oct 1, 2024

SaadBazaz commented Oct 2, 2024

Conaclos commented Oct 2, 2024

SaadBazaz commented Oct 2, 2024

Conaclos commented Oct 3, 2024

SaadBazaz commented Oct 3, 2024

SaadBazaz commented Oct 3, 2024

codspeed-hq bot commented Oct 3, 2024 • edited Loading

CodSpeed Performance Report

Merging #4118 will degrade performances by 8.01%

Summary

Benchmarks breakdown

SaadBazaz commented Oct 4, 2024

dyc3 left a comment

Choose a reason for hiding this comment

dyc3 Oct 5, 2024 • edited Loading

Choose a reason for hiding this comment

SaadBazaz Oct 6, 2024

Choose a reason for hiding this comment

dyc3 Oct 6, 2024

Choose a reason for hiding this comment

SaadBazaz Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

Conaclos Oct 7, 2024

Choose a reason for hiding this comment

SaadBazaz Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

Conaclos Oct 7, 2024

Choose a reason for hiding this comment

SaadBazaz Oct 7, 2024

Choose a reason for hiding this comment

SaadBazaz commented Sep 29, 2024 •

edited

Loading

codspeed-hq bot commented Oct 3, 2024 •

edited

Loading

dyc3 Oct 5, 2024 •

edited

Loading

SaadBazaz Oct 7, 2024 •

edited

Loading

SaadBazaz Oct 7, 2024 •

edited

Loading