-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect options from baseline #124
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test cases! Fix and ship!
detect_secrets/main.py
Outdated
if args.import_filename: | ||
plugins = initialize.merge_plugin_from_baseline( | ||
_get_plugin_from_baseline(args.import_filename), args, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this block inside _perform_scan
, right after we get the old_baseline.
e.g.
def _perform_scan(args, plugins):
old_baseline = _get_existing_baseline(args.import_filename)
if old_baseline:
plugins = initialize.merge_plugin_from_baseline(...)
This way, we don't need to incur two file reads. Also, I'm not entirely sure that reading from stdin
would work twice.
tests/main_test.py
Outdated
) == 0 | ||
|
||
print("Used:", file_writer.call_args[1]['data']['plugins_used']) | ||
print("Wrote:", plugins_wrote) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove these print
statements, before merging.
tests/pre_commit_hook_test.py
Outdated
@@ -44,17 +44,53 @@ def test_file_with_secrets(self, mock_log): | |||
def test_file_no_secrets(self): | |||
assert_commit_succeeds('test_data/files/file_with_no_secrets.py') | |||
|
|||
def test_baseline(self): | |||
@pytest.mark.parametrize( | |||
' has_result, use_private_key_scan,, hook_command, commit_result', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra comma? maybe s/commit_result/commit_succeeds
? (also prefixed space, but less fussy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, extra comma. The rename sounds good. I will update in a new commit.
tests/main_test.py
Outdated
}, | ||
], | ||
), | ||
( # ignore overwriten option from CLI when not using --use-all-plugins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. If you explicitly override it by providing the CLI argument, even without --use-all-plugins
, wouldn't you expect CLI to take precedence over a configured baseline?
Example use case: user wants to see if a decrease in sensitivity results in a large drop in noise.
$ detect-secrets scan --update .secrets.baseline --base64-limit 5
$ git diff .secrets.baseline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debated about that, it might have some edge cases. For example, if the baseline does not have the base64 scan, do we ignore the limit? or add the base64 scan in? So I chose to let the user on purposely use the plugin, then allow adjusting. How would you suggest to adjust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good point you brought up.
What about displaying a warning when they don't have the plugin installed, then ignoring it? Seems to accomplish several things:
- Allows CLI arguments to configure limits, when they are relevant.
- Ignores CLI arguments when not relevant.
- Informs the user that arguments passed are redundant, yet not hard blocking due to poor invocation of command.
$ detect-secrets scan --update .secrets.baseline --base64-limit 5
WARN: --base64-limit specified, but Base64HighEntropyString not configured! Ignoring...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's updated to support these cases now. New test cases added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great so far
return plugins | ||
|
||
|
||
def _merge_plugin_from_baseline(baseline_plugins, args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename to _trim_disabled_plugins_from_baseline
, it would make the make the comments in merge_plugin_from_baseline
less needed.
|
||
|
||
def _merge_plugin_from_baseline(baseline_plugins, args): | ||
merged_plugins_dict = {vars(plugin)['name']: plugin for plugin in baseline_plugins} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great use of vars
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Credit to @meneal
9fbeff6
to
4570adf
Compare
4570adf
to
a2f74dd
Compare
@domanchi ready for another round. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Thanks for writing that hairy logic of plugin prioritization.
|
||
# input param priority > baseline | ||
input_plugins_dict = dict(args.plugins) | ||
for plugin_name, plugin_params in list(input_plugins_dict.items()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need to turn input_plugins_dict.items()
into a list. It gives us an iterator, so we get some performance boosts by avoiding turning it into a list, before iterating through it.
Same goes for other uses of this behavior.
if args.disabled_plugins: | ||
for plugin_name in args.disabled_plugins: | ||
if plugin_name in plugins_dict: | ||
plugins_dict.pop(plugin_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plugins_dict = {
plugin_name: plugin_params
for plugin_name, plugin_params in baseline_plugins_dict.items()
if plugin_name not in args.disabled_plugins
}
log.warning( | ||
'%s specified, but %s not configured! Ignoring...' | ||
% ("".join(["--", param_name.replace("_", "-")]), plugin_name), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this DRYer, seeing that we use it below as well? One idea might be to implement an iterator, or just encapsulate it in a function. For example:
def get_prioritized_parameters(plugins, is_default_plugins_map, prefer_default=True):
"""
:type is_default_plugins_map: dict(str => bool)
:param is_default_plugins_map: mapping of parameter name to whether its value is derived
from a default value.
:param prefer_default: if True, will yield if plugin parameters are from default values.
Otherwise, will yield if plugin parameters are *not* from default values.
"""
...
yield plugin_name, param_name, param_value
Then, you can do:
plugins_dict = dict(args.plugins)
for plugin_name, param_name, param_value in get_prioritized_parameters(
input_plugins_dict,
args.param_from_default,
prefer_default=False,
):
try:
plugins_dict[plugin_name][param_name] = param_value
except KeyError:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been updated now. Although having more lines, but I do think the logic is a little bit more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearer logic >> more lines, especially when it comes to long term maintenance 😸
@@ -309,6 +309,7 @@ def consolidate_args(args): | |||
|
|||
active_plugins = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put a comment here that explains the rationale for why we're returning these new values.
I understand the purpose of doing it, and that there's really no better way to incorporate the baseline values in it. It's just unfortunate that this hairy logic to perform prioritization of plugins needs to live in two places (unlike this doc string suggests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disabled_plugins
is mainly used to filter off some disabled plugins when reading the plugin list from baseline. Relates to the comment below, it's possible to generate the disabled_plugins
list on-demand by calculating the difference between all plugins and active plugins. I can add a helper function to the usage.py
.
I was using disabled_plugins
earlier to avoid re-iterating on the all plugins to get such list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Makes sense.
Thanks for adding the helper function anyway!
detect_secrets/core/usage.py
Outdated
@@ -309,6 +309,7 @@ def consolidate_args(args): | |||
|
|||
active_plugins = {} | |||
disabled_plugins = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it, what is our purpose of returning a set of disabled_plugins? Since args.plugins
is a collection of all active plugins (combining default values and specifically set values), wouldn't any plugin that isn't in this dictionary be disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
detect_secrets/core/usage.py
Outdated
@@ -309,6 +309,7 @@ def consolidate_args(args): | |||
|
|||
active_plugins = {} | |||
disabled_plugins = {} | |||
param_from_default = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe is_default_parameter
, so it's clearer that it contains boolean values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I would make a suggestion to rename it to is_using_default_value
It definitely makes the job more challenge when I have to convert the BasePlugin tuple into a dictionary, then compares and sets values. I noticed the BasePlugin class has made |
@domanchi all comments addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask what was the rationale of overwriting
__dict__
instead of directly using property on each field?
Seeing that this package is designed with baseline readability in mind, I wanted an easy way for each plugin to output information that:
- Was easy for humans to look at, and reason about
- Was sufficient information to initialize the plugin from
I think the baseline output seems to achieve those goals.
If I'm interpreting your question correctly, I think for the most part, you can use the property on each field, with the exception that different plugins may have different properties. And I certainly don't have any qualms against something like:
class BasePlugin:
self.name = self.__class__.__name__
except that I didn't need it at the time :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome 👍
Fully implements the proposal in #121 (comment). More than half of the changes are new test cases.
Some logic can be implemented in different places, I chose the current way based on my understanding. Let me know if you have any suggetsions or comments.
FYI @KevinHock @domanchi
CC @jribm