Validation of secrets #142
Replies: 9 comments
-
Yes, automated or semi-automated secret validation would be very useful. This does not exist presently, not due to lack of interest, but for focus and engineering constraints. The design space for validation mechanisms is large, so thank you for the implementation sketch and pointing at some existing resources. Ideally, a useful validation mechanism could be specified alongside the rule YAML data files (and not as special-cased code in Nosey Parker). I will ruminate on this more. Thank you for writing this up! |
Beta Was this translation helpful? Give feedback.
-
Slightly related is #59, a proposal for another invalidation mechanism for findings: There are several existing rules, like for detecting JWTs and PEM-encoded private keys, where some additional downstream parsing validation could eliminate lots of false positives, but it's not possible to express using regular expressions. That is, if a supposed PEM-encoded private key doesn't actually decode as PEM, ignore it. |
Beta Was this translation helpful? Give feedback.
-
Yeah interesting, another example of invalidation I can think of would be the Atlassian API token, since there is a CRC checksum at the end which would help rule out false positives. But at that point, you may aswell send a request to validate the token and skip the middle step. trufflesecurity/trufflehog#649 (comment) Design space is very large, agree that having it as part of the YAML rather than special case code would be best as it allows more contributors and customization, versus the way Trufflehog is implemented. |
Beta Was this translation helpful? Give feedback.
-
Also interesting are things like Password: ${{secret}} I get lots of these and they are easily regexable. IN this case it means I look up the secret value from a vault and replace the ${secret} or whatnot with the actual secret but NOT in the code. url = "${var.environment.ace.acr_name}.azurecr.io"
registry = {
url = local.url
username = var.environment.ace.acr_user
password = var.environment.ace.acr_token
}
start_command_line = "" |
Beta Was this translation helpful? Give feedback.
-
I did look at the ML paper in the end of the original 'presentation' that helped me discover this. They seemed to like the 'voting' algo. I do understand why you don't include it (since your business needs to make some $$$ so you dont' starve and have a living wage), but I was wondering just about 'false positives' vs 'false negatives' and ML. It MIGHT not hurt your business too much if we maintained a list of 'false postive' examples that I could train with my OWN ml thingies. The list itself could be contrib, and would help most folks using this, W/O adding the ML portion to nosey and hence hurting your business. |
Beta Was this translation helpful? Give feedback.
-
The list might be either a jSON encoded document with type:FN or FP (false negative or false postive) which then could be fed to training gods. It would HELP your business (any anyone else who wants to do ml alas) with a community maintanined list of things Nosey detcts. |
Beta Was this translation helpful? Give feedback.
-
A public dataset of secrets would be very interesting and useful. I'm not aware of any open dataset like that today, as, particularly with the case of valid secrets, there is some security concern for publishing those, as doing so could enable drive-by hacking. It would be pretty interesting to keep track of false positives, and use those at runtime to suppress any matches from that list. A sort of deny-list capability would be helpful, regardless, especially for uses in CI — there, you need either 0% false positives, or a mechanism to ignore a finding, or else you can't use the tool as a possibly build-breaking step. |
Beta Was this translation helpful? Give feedback.
-
I hadn't thought about 'oops we just published our aws root keys on github'. So perhaps examples of things that AREN't secrets. Where this gets especially interesting is things like $password etc. as variable names. We have NUMEROUS cases of this. And false positives are quite varied. Hence the "machine learning ready" dataset of false negatives. |
Beta Was this translation helpful? Give feedback.
-
And in listing them, we really don't 'steal' anything from your company by adding a "value added" feature that keeps you from being a legitimate business! |
Beta Was this translation helpful? Give feedback.
-
I'd love to use Nosey Parker for defensive purposes, and being able to filter results to only show secrets which are confirmed valid is essential.
It should be an optional feature, likely occurring during the
report
command to keep scans performant.Each rule could have optional section listing URL + parameters / headers to specify what network request to make in order to verify the credential. The slack rule https://github.com/praetorian-inc/noseyparker/blob/74098e8881bd68a6368032b78b56ddad5ecbbad4/crates/noseyparker/data/default/rules/slack.yml could have a section which specifies
curl https://slack.com/api/auth.test -H "Authorization: Bearer $(1)"
as the request to make. With $(1) as a placeholder for the first capture group from the regex.Many project re-invent their own request / response matching format, very curious if you might be able to integrate an existing system like https://hurl.dev with a flexible way to test a response to confirm a valid secret.
Beta Was this translation helpful? Give feedback.
All reactions