Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp init: interactive and explicit options #6637

Closed

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Sep 17, 2021

On top of #6630.

Defaults from the config is not supported yet.

The --interactive will ask for the options that are specified in the template file and will fall back to the default values from the command flags + configs (not implemented yet) + hardcoded defaults.

When --explicit is used, it won't fall back to the defaults from config or hardcoded values. Mixed with --interactive, it won't allow users to proceed ahead without answering. This is against the spec, that particular values should be skipped, but with templates, it's not possible to do that.

Custom templates are supported, whatever it is created in .dvc/stages. You can create default templates in .dvc/stages by running python -m dvc.repo.experiments.init, but it can also fallback to the default files it keeps in resources/stages.

The template is not usable yet in packages nor it will
be initialized during 'dvc init' though. But it supports
custom templates if user adds the template in .dvc/stages
or in dvc's codebase on resources/stages.
@dberenbaum
Copy link
Collaborator

dberenbaum commented Sep 17, 2021

When --explicit is used, it won't fall back to the defaults from config or hardcoded values. Mixed with --interactive, it won't allow users to proceed ahead without answering. This is against the spec, that particular values should be skipped, but with templates, it's not possible to do that.

This could be a problem unless we implement optional outputs (#4410), and we might need optional dependencies also. Otherwise, it seems like this will break a lot of simple scenarios where users don't have plots or some other part of the stage template.

I was thinking that we could have a template with conditionals, but behavior for custom templates that don't implement those conditionals could be confusing 🤔. The spec used jinja because it's familiar, flexible, and doesn't require custom DSL/templating. However, usability is more important than flexibility since this feature is about solving simple use cases. Also, we already have custom templating for dvc.yaml and plots, so maybe it's better to reuse one of those.

If we reuse dvc.yaml-style templating, we can keep the template as valid yaml, right?

cmd: ${cmd}
deps:
- ${code}
- ${data}
params:
- ${params}
outs:
- ${models}
metrics:
- ${metrics}:
    cache: false
plots:
- ${plots}:
    cache: false

I assume this provides more control to do things like:

  • Expand the parameters without relying on user's templating logic.
  • If a value is missing, remove it (for example, data).
  • If a value is missing and it's the only item in the section, remove that whole section (for example, plots).
  • Maybe support multiple inputs (for example, dvc exp init --data=train,val).
  • Provide better template validation (don't need to render to validate, can provide more helpful errors).

Let me know if you have other ideas. The spec defines expected usage, but implementation is up to you.

@skshetry
Copy link
Member Author

skshetry commented Sep 20, 2021

Either jinja or any other custom template, I am not super convinced that we need one. Playing with this current implementation, the template has been only useful to specify cache: false stuff (where we can have reasonable defaults) and this optional dependency thing, of course. It seems that we are complicating the implementation for a simple feature just because of the lack of optional dependencies support (#4410).

For working around optional dependency, maybe we can just allow user to skip some of the options?
In dvc exp init -i, it would allow users to skip by entering n/Ctrl + D:

$ dvc exp init -i
Command to run: python script.py
Path to data file/directory [data, n to skip]: 
Path to source file/directory [src, n to skip]:
...

And, in non-interactive mode, we can have --ignore-missing/--explicit/--no-defaults options.

@dberenbaum
Copy link
Collaborator

Either jinja or any other custom template, I am not super convinced that we need one. Playing with this current implementation, the template has been only useful to specify cache: false stuff (where we can have reasonable defaults)

I was thinking it could be useful for other reasons:

  • Configuring other options like persist: true, checkpoint: true, plots options, live options.
  • Making it easy to see what the stage will look like.

However, you are right that it's probably not essential, especially in an initial release, if the optional outputs issue is resolved.

It seems that we are complicating the implementation for a simple feature just because of the lack of optional dependencies support (#4410).

For working around optional dependency, maybe we can just allow user to skip some of the options?
In dvc exp init -i, it would allow users to skip by entering n/Ctrl + D

Would it be better to support optional dependencies and outputs and not need to implement any workaround? All of the workarounds so far seem clunky (having different behavior for interactive mode and non-interactive mode seems odd to me, and behavior of dvc exp init -i --ignore-missing seems unclear).

@skshetry
Copy link
Member Author

skshetry commented Sep 21, 2021

  • Configuring other options like persist: true, checkpoint: true, plots options, live options.

For dvc exp init, I don't think configuring matters that much, considering this will be users' first stage. Users can change them later on anyway (we should probably add a link to the guide at the end). We don't recommend persist, plots have good defaults already, live has the same. checkpoint: true makes sense, maybe it should be set to even true by default.

Would it be better to support optional dependencies and outputs and not need to implement any workaround? All of the workarounds so far seem clunky (having different behavior for interactive mode and non-interactive mode seems odd to me, and behavior of dvc exp init -i --ignore-missing seems unclear).

Even without optional dependencies, we may want to give users the choice of skipping some stuff. We could have skip on non-interactive mode with --no-{code,models,...} flag as well if we need individual skips.


Thinking about templates more, we may just end up using the templates just for our own purpose of skipping some stuff. Considering this is targeted at the new users, will they even use it? Even if they do, are we not directing them to learn more complicated stuff? Maybe guiding them to the docs/UG is a better thing to do here.

@dberenbaum
Copy link
Collaborator

I agree that we can forget about obscure options like persist: true and even custom templates altogether if it helps. The intended use case for custom templates was to avoid boilerplate for more advanced users (avoiding copying dvc.yaml between projects or doing the same dvc stage add), but you make a good point that there's not that many other ways to configure the stage that seem all that likely. As I said above, usability is more important than flexibility here.

Even without optional dependencies, we may want to give users the choice of skipping some stuff. We could have skip on non-interactive mode with --no-{code,models,...} flag as well if we need individual skips.

With optional dependencies, we wouldn't need any of this, right? Since this command should focus on simplicity, I would try to avoid lots of options like --no-code.

@dberenbaum
Copy link
Collaborator

dberenbaum commented Sep 21, 2021

Thinking more about how optional outputs/dependencies would work here. Would they be explicitly marked as optional: true? If everything has to be marked optional, is this any less clunky than other options? Also, if we are looking to gently onboard users, this seems to encourage marking everything optional and raises questions about why optional deps/outs need to be explicitly marked.

Maybe we are better off keeping the --ignore-missing/--explicit option.

Some ideas for interactive mode:

  • Still requiring --ignore-missing/--explicit.
  • Path to data file/directory [data, n to skip]: . I think we can work on the language to make the behavior more clear.
  • Using auto-suggestion (https://github.com/prompt-toolkit/python-prompt-toolkit/blob/master/examples/prompts/auto-suggestion.py) or pre-filled input for configured values that users can override.
  • Asking after every blank input what the desired behavior is: Do you want to use the configured path "data" or omit this?
  • Asking at the beginning whether to ignore configured values: Do you want to use the following configured paths or omit them?
  • Reorganizing the inputs like:
> Select a path to configure:
> 1. Code file or directory on which your experiment depends: src/train.py
> 2. Data file or directory on which your experiment depends: data
> 3. Parameters file on which your experiment depends: params.yaml
> 4. Model file or directory generated by your experiment: models
> 5. Metrics file for your experiment: metrics.json
> 6. Plots directory or file for your experiment: plots
6
> Enter the path to the plots directory or file for your experiment [or leave blank to omit]:

Interactive mode should probably also include a question about template/whether this is a DVCLive experiment (for logging checkpoints in deep learning).

@skshetry
Copy link
Member Author

Asking after every blank input what the desired behavior is

Asking after every blank input maybe too much, I think we can just ask at the end for confirmation. I think we can learn from poetry init here:

$ poetry init

This command will guide you through creating your pyproject.toml config.

Package name [dvc]:
Version [0.1.0]:
Description []:
Author [AuthorName <[email protected]>, n to skip]:  user
License []:
Compatible Python versions [^3.8]:

Would you like to define your main dependencies interactively? (yes/no) [yes] no
Would you like to define your development dependencies interactively? (yes/no) [yes] no
Generated file

[tool.black]
line-length = 79
include = '\.pyi?$'
exclude = '''
/(
    \.eggs
  | \.git
  | \.hg
  | \.mypy_cache
  | \.tox
  | \.venv
  | _build
  | buck-out
  | build
  | dist
)/
'''

[tool.poetry]
name = "dvc"
version = "0.1.0"
description = ""
authors = ["user"]

[tool.poetry.dependencies]
python = "^3.8"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"


Do you confirm generation? (yes/no) [yes] yes

Using auto-suggestion

It may be helpful if we add support for path suggestion, but I'd like to leave it for minimal implementation. Could think of it later.

@dberenbaum
Copy link
Collaborator

dberenbaum commented Sep 22, 2021

The poetry confirmation seems more like if dvc prints what will be added to dvc.yaml and asks for confirmation at the end (which could be a good idea). It doesn't change that output.

I think some version of your original Path to data file/directory [data, n to skip]: might be best since it doesn't generate additional prompts for each input, it shows the default path, and it gives users granular control over each path. There are probably lots of projects like a src directory, models at a bespoke path, and no plots. This approach seems like the easiest way to accommodate those projects.

Suggested wording: Path to data file/directory on which the experiment depends [leave blank for "data", type "n" to omit]: (we might need to state "on which the experiment depends" or similar since "data" is very generic). We can also tweak this later if we agree on the blank vs. "n" input behavior.

Edit: Yes, I realize this is all the behavior you proposed in #6637 (comment) 😄 .

@skshetry
Copy link
Member Author

This is stale, superseded by #6681.

@skshetry skshetry closed this Sep 24, 2021
@skshetry skshetry deleted the exp-init-interactive-explicit branch September 24, 2021 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants