Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add context about failing branches during error in used_cache() #5038

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions dvc/command/gc.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ def run(self):
jobs=self.args.jobs,
repos=self.args.repos,
workspace=self.args.workspace,
skip_failing_collect=self.args.skip_failing_collect,
)
return 0

Expand Down Expand Up @@ -141,4 +142,14 @@ def add_parser(subparsers, parent_parser):
"Useful if you share a single cache across repos.",
metavar="<paths>",
)
gc_parser.add_argument(
"--skip-failing-collect",
action="store_true",
default=False,
help=(
"Skip commits were the cache cannot be "
"collected due to user errors in the dvc "
"state (eg. badly formatted dvc.yaml)."
),
)
gc_parser.set_defaults(func=CmdGC)
39 changes: 24 additions & 15 deletions dvc/repo/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from git import InvalidGitRepositoryError

from dvc.config import Config
from dvc.exceptions import FileMissingError
from dvc.exceptions import DvcException, FileMissingError
from dvc.exceptions import IsADirectoryError as DvcIsADirectoryError
from dvc.exceptions import NotDvcRepoError, OutputNotFoundError
from dvc.path_info import PathInfo
Expand Down Expand Up @@ -293,6 +293,7 @@ def used_cache(
jobs=None,
recursive=False,
used_run_cache=None,
ignore_dvc_exceptions=False,
):
"""Get the stages related to the given target and collect
the `info` of its outputs.
Expand All @@ -319,22 +320,30 @@ def used_cache(
):
targets = targets or [None]

pairs = cat(
self.stage.collect_granular(
target, recursive=recursive, with_deps=with_deps
try:
pairs = cat(
self.stage.collect_granular(
target, recursive=recursive, with_deps=with_deps
)
for target in targets
)
for target in targets
)

suffix = f"({branch})" if branch else ""
for stage, filter_info in pairs:
used_cache = stage.get_used_cache(
remote=remote,
force=force,
jobs=jobs,
filter_info=filter_info,
)
cache.update(used_cache, suffix=suffix)
suffix = f"({branch})" if branch else ""
for stage, filter_info in pairs:
used_cache = stage.get_used_cache(
remote=remote,
force=force,
jobs=jobs,
filter_info=filter_info,
)
cache.update(used_cache, suffix=suffix)
except DvcException as e:
logger.error(f"Cache for '{branch}' could not be collected")
if ignore_dvc_exceptions:
Copy link
Contributor

@efiop efiop Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like using --force would be intuitive here.

Suggested change
if ignore_dvc_exceptions:
if ignore_dvc_exceptions or force:

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you suggest using force and ignore_dvc_exceptions or just force?
Indeed, --force seems more coherent with options across dvc commands

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@courentin Seems like --force would be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, --force is already used in the gc command "Force garbage collection - automatically agree to all prompts".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop Don't we want to differentiate --force where the behaviour is "Force garbage collection - automatically agree to all prompts" and --skip-failing-collect so that we do one without the other?

WDYT?

I'd like to merge this PR so that our team can clean their workspace :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@courentin Do you have a scenario in which those could be used separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say if we want to ignore failing commits but not agree to all prompts. But to be honest I'm fine with using only --force, I let you decide what's best for the product ;)

logger.debug(e)
continue

raise e

if used_run_cache:
used_cache = self.stage_cache.get_used_cache(
Expand Down
2 changes: 2 additions & 0 deletions dvc/repo/gc.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ def gc(
jobs=None,
repos=None,
workspace=False,
skip_failing_collect=False,
):

# require `workspace` to be true to come into effect.
Expand Down Expand Up @@ -66,6 +67,7 @@ def gc(
remote=remote,
force=force,
jobs=jobs,
ignore_dvc_exceptions=skip_failing_collect,
)
)

Expand Down