Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect options from baseline #124

Merged
merged 9 commits into from
Feb 6, 2019
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@
/.coverage
/.pytest_cache
/.tox
/venv**
/venv*
/tmp

.*ignore
!.gitignore
.python-version
.vscode
4 changes: 2 additions & 2 deletions detect_secrets/core/secrets_collection.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,13 @@ def load_baseline_from_string(cls, string):
:raises: IOError
"""
try:
return cls._load_baseline_from_dict(json.loads(string))
return cls.load_baseline_from_dict(json.loads(string))
except (IOError, ValueError):
log.error('Incorrectly formatted baseline!')
raise

@classmethod
def _load_baseline_from_dict(cls, data):
def load_baseline_from_dict(cls, data):
"""Initializes a SecretsCollection object from dictionary.

:type data: dict
Expand Down
27 changes: 25 additions & 2 deletions detect_secrets/core/usage.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,14 @@
from detect_secrets import VERSION


def add_use_all_plugins_argument(parser):
parser.add_argument(
'--use-all-plugins',
action='store_true',
help='Use all available plugins to scan files.',
)


class ParserBuilder(object):

def __init__(self):
Expand All @@ -21,7 +29,8 @@ def add_default_arguments(self):

def add_pre_commit_arguments(self):
self._add_filenames_argument()\
._add_set_baseline_argument()
._add_set_baseline_argument()\
._add_use_all_plugins_argument()

PluginOptions(self.parser).add_arguments()

Expand Down Expand Up @@ -78,6 +87,9 @@ def _add_set_baseline_argument(self):
)
return self

def _add_use_all_plugins_argument(self):
add_use_all_plugins_argument(self.parser)


class ScanOptions(object):

Expand Down Expand Up @@ -122,6 +134,9 @@ def _add_initialize_baseline_argument(self):
dest='import_filename',
)

# Pairing `--update` with `--use-all-plugins` to overwrite plugins list from baseline
add_use_all_plugins_argument(self.parser)

self.parser.add_argument(
'--all-files',
action='store_true',
Expand Down Expand Up @@ -293,16 +308,21 @@ def consolidate_args(args):
return

active_plugins = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put a comment here that explains the rationale for why we're returning these new values.

I understand the purpose of doing it, and that there's really no better way to incorporate the baseline values in it. It's just unfortunate that this hairy logic to perform prioritization of plugins needs to live in two places (unlike this doc string suggests).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disabled_plugins is mainly used to filter off some disabled plugins when reading the plugin list from baseline. Relates to the comment below, it's possible to generate the disabled_plugins list on-demand by calculating the difference between all plugins and active plugins. I can add a helper function to the usage.py.

I was using disabled_plugins earlier to avoid re-iterating on the all plugins to get such list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Makes sense.

Thanks for adding the helper function anyway!

disabled_plugins = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, what is our purpose of returning a set of disabled_plugins? Since args.plugins is a collection of all active plugins (combining default values and specifically set values), wouldn't any plugin that isn't in this dictionary be disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

param_from_default = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe is_default_parameter, so it's clearer that it contains boolean values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I would make a suggestion to rename it to is_using_default_value


for plugin in PluginOptions.all_plugins:
arg_name = PluginOptions._convert_flag_text_to_argument_name(
plugin.disable_flag_text,
)

# Remove disabled plugins
is_disabled = getattr(args, arg_name)
is_disabled = getattr(args, arg_name, False)
delattr(args, arg_name)
if is_disabled:
disabled_plugins.update({
plugin.classname: {},
})
continue

# Consolidate related args
Expand All @@ -323,12 +343,15 @@ def consolidate_args(args):

if default_value and related_args[arg_name] is None:
related_args[arg_name] = default_value
param_from_default[arg_name] = True

active_plugins.update({
plugin.classname: related_args,
})

args.plugins = active_plugins
args.disabled_plugins = disabled_plugins
args.param_from_default = param_from_default

def _add_custom_limits(self):
high_entropy_help_text = (
Expand Down
18 changes: 14 additions & 4 deletions detect_secrets/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from detect_secrets.core import baseline
from detect_secrets.core.common import write_baseline_to_file
from detect_secrets.core.log import log
from detect_secrets.core.secrets_collection import SecretsCollection
from detect_secrets.core.usage import ParserBuilder
from detect_secrets.plugins.common import initialize

Expand Down Expand Up @@ -39,10 +40,7 @@ def main(argv=None):
_scan_string(line, plugins)

else:
baseline_dict = _perform_scan(
args,
plugins,
)
baseline_dict = _perform_scan(args, plugins)

if args.import_filename:
write_baseline_to_file(
Expand Down Expand Up @@ -79,6 +77,14 @@ def main(argv=None):
return 0


def _get_plugin_from_baseline(old_baseline):
plugins = []
if old_baseline and "plugins_used" in old_baseline:
secrets_collection = SecretsCollection.load_baseline_from_dict(old_baseline)
plugins = secrets_collection.plugins
return plugins


def _scan_string(line, plugins):
longest_plugin_name_length = max(
map(
Expand Down Expand Up @@ -106,6 +112,10 @@ def _perform_scan(args, plugins):
:rtype: dict
"""
old_baseline = _get_existing_baseline(args.import_filename)
if old_baseline:
plugins = initialize.merge_plugin_from_baseline(
_get_plugin_from_baseline(old_baseline), args,
)

# Favors --exclude argument over existing baseline's regex (if exists)
if args.exclude:
Expand Down
65 changes: 65 additions & 0 deletions detect_secrets/plugins/common/initialize.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,71 @@ def from_parser_builder(plugins_dict):
return tuple(output)


def merge_plugin_from_baseline(baseline_plugins, args):
"""
:type baseline_plugins: tuple of BasePlugin
:param baseline_plugins: BasePlugin instances from baseline file

:type args: dict
:param args: diction of arguments parsed from usage

param priority: input param > baseline param > default

:Returns tuple of initialized plugins
"""
def _remove_key(d, key):
r = dict(d)
r.pop(key)
return r
baseline_plugins_dict = {
vars(plugin)["name"]: _remove_key(vars(plugin), "name")
for plugin in baseline_plugins
}

# Use input plugin as starting point
if args.use_all_plugins:
# input param and default param are used
plugins_dict = dict(args.plugins)

# baseline param priority > default
for plugin_name, plugin_params in list(baseline_plugins_dict.items()):
for param_name, param_value in list(plugin_params.items()):
from_default = args.param_from_default.get(param_name, False)
if from_default:
try:
plugins_dict[plugin_name][param_name] = param_value
except KeyError:
log.warning(
'Baseline contain plugin %s which is not in all plugins! Ignoring...'
% (plugin_name),
)

return from_parser_builder(plugins_dict)

# Use baseline plugin as starting point
plugins_dict = baseline_plugins_dict
if args.disabled_plugins:
for plugin_name in args.disabled_plugins:
if plugin_name in plugins_dict:
plugins_dict.pop(plugin_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugins_dict = {
    plugin_name: plugin_params
    for plugin_name, plugin_params in baseline_plugins_dict.items()
    if plugin_name not in args.disabled_plugins
}


# input param priority > baseline
input_plugins_dict = dict(args.plugins)
for plugin_name, plugin_params in list(input_plugins_dict.items()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to turn input_plugins_dict.items() into a list. It gives us an iterator, so we get some performance boosts by avoiding turning it into a list, before iterating through it.

Same goes for other uses of this behavior.

for param_name, param_value in list(plugin_params.items()):
from_default = args.param_from_default.get(param_name, False)
if from_default is False:
try:
plugins_dict[plugin_name][param_name] = param_value
except KeyError:
log.warning(
'%s specified, but %s not configured! Ignoring...'
% ("".join(["--", param_name.replace("_", "-")]), plugin_name),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this DRYer, seeing that we use it below as well? One idea might be to implement an iterator, or just encapsulate it in a function. For example:

def get_prioritized_parameters(plugins, is_default_plugins_map, prefer_default=True):
    """
    :type is_default_plugins_map: dict(str => bool)
    :param is_default_plugins_map: mapping of parameter name to whether its value is derived
        from a default value.

    :param prefer_default: if True, will yield if plugin parameters are from default values.
        Otherwise, will yield if plugin parameters are *not* from default values.
    """
    ...
    yield plugin_name, param_name, param_value

Then, you can do:

plugins_dict = dict(args.plugins)
for plugin_name, param_name, param_value in get_prioritized_parameters(
    input_plugins_dict,
    args.param_from_default,
    prefer_default=False,
):
    try:
        plugins_dict[plugin_name][param_name] = param_value
    except KeyError:
        ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been updated now. Although having more lines, but I do think the logic is a little bit more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearer logic >> more lines, especially when it comes to long term maintenance 😸


return from_parser_builder(plugins_dict)


def from_plugin_classname(plugin_classname, **kwargs):
"""Initializes a plugin class, given a classname and kwargs.

Expand Down
7 changes: 6 additions & 1 deletion detect_secrets/pre_commit_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,12 @@ def main(argv=None):
return 1

plugins = initialize.from_parser_builder(args.plugins)

# Merge plugin from baseline
if baseline_collection:
plugins = initialize.merge_plugin_from_baseline(baseline_collection.plugins, args)
baseline_collection.plugins = plugins

results = find_secrets_in_files(args, plugins)
if baseline_collection:
original_results = results
Expand All @@ -59,7 +65,6 @@ def main(argv=None):
)

if VERSION != baseline_collection.version:
baseline_collection.plugins = plugins
baseline_collection.version = VERSION
baseline_modified = True

Expand Down
24 changes: 24 additions & 0 deletions tests/core/baseline_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -424,6 +424,30 @@ def test_new_results_has_nothing(self):

assert merge_results(old_result, {}) == {}

def test_old_results_have_diff_type_will_carry_over(self):
secretA = self.get_secret()
secretA["type"] = "different type"
secretB = self.get_secret()

assert merge_results(
{
'filenameA': [
secretA,
],
},
{
'filenameA': [
secretA,
secretB,
],
},
) == {
'filenameA': [
secretA,
secretB,
],
}

def test_old_results_have_subset_of_new_results(self):
secretA = self.get_secret()
secretB = self.get_secret()
Expand Down
2 changes: 1 addition & 1 deletion tests/core/secrets_collection_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,7 @@ def test_output(self, mock_gmtime):

def test_load_baseline_from_string(self, mock_gmtime):
"""
We use load_baseline_from_string as a proxy to testing _load_baseline_from_dict,
We use load_baseline_from_string as a proxy to testing load_baseline_from_dict,
because it's the most entry into the private function.
"""
original = self.get_baseline_dict(mock_gmtime)
Expand Down
Loading