Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent interface for algorithms & single training script #79

Open
AdamGleave opened this issue Aug 14, 2019 · 17 comments
Open

Consistent interface for algorithms & single training script #79

AdamGleave opened this issue Aug 14, 2019 · 17 comments
Labels
enhancement New feature or request
Milestone

Comments

@AdamGleave
Copy link
Member

Currently we don't have any CLI script for behavioral cloning or the density baseline. I envisage this codebase as being particularly useful in being able to rapidly benchmark against a wide variety of algorithms. For this to be possible, we need to have (as far as is possible) a consistent interface between different imitation learning algorithms, similarly to how Stable Baselines has a similar interface for all RL algorithms implemented. Ideally we'd then also have imitation.scripts.train work for all of them, although there'll clearly need to be som algorithm-specific configs.

@shwang
Copy link
Member

shwang commented Aug 28, 2021

We now have a SB3-like .train(total_timesteps) interface for GAIL, AIRL, DAgger, and BC. Also we have Sacred scripts for each of these algorithms, though they are not unified into a single train.

@AdamGleave
Copy link
Member Author

We've made a lot of progress since this issue was first opened but there's still room for improvement so I'm leaving this open for now.

@ernestum
Copy link
Collaborator

ernestum commented Jan 24, 2023

I think this issue is required before we can address #587 and start a new take on #602.
I reviewed the scripts and this are some points we should address:

This is the current relationship between scripts and algorithms

Algo Script comments
DAgger, BC train_imitation BC is a "hidden feature" by setting use_dagger=False
MCE IRL - do we need a script for this?
AIRL, GAIL train_adversarial They being thrown together into one script results in lots of weird constructs such as an algorithm_specific configuration that are merged with the config using dubious hacks and configuration being modified during runtime.
PreferenceComparisons train_preference_comparisons -

There is a reward ingredient, that should probably be called reward_net. It takes care of edge cases for AIRL which I think is a bad idea. It also has some named configs that only enable/disable individual configurations. I think this is something that should be handled by the config_hook.

There is a train ingredient which should be split up into a policy and an evaluation ingredient. Ingredients are objects, not verbs.

We should probably have an ingredient for each algorithm. This makes constructing more complex experiments (such as warm-starting) easier.
I also would vote for one train_<algo> script per algorithm to follow unix philosophy that each tool should only do one job.

@AdamGleave
Copy link
Member Author

AdamGleave commented Jan 31, 2023

We should probably have an ingredient for each algorithm. This makes constructing more complex experiments (such as warm-starting) easier.

Note sacred.Experiment is-a sacred.Ingredient (parent class), so I think we already have this more or less? EDIT: After seeing #663 I do think there's a case for making BC an ingredient if we want to use it to warm-start a variety of other algorithms, and because it already has a compositional relationship with DAgger. BC might be a bit of a special case in that regard though.

I also would vote for one train_ script per algorithm to follow unix philosophy that each tool should only do one job.

I feel torn here. It's nice to be able to do benchmarks with a variety of algorithms using a consistent interface. Having one script with different subcommands seems more natural here. You could split it up into different scripts, but it seems like a bit of a nightmare to maintain a consistent interface across many files, although perhaps doable if we can factor out most things into ingredients?

However, the algorithms do have different enough fundamental properties we can't literally have the same interface for all. Current approach was a compromise grouping similar algorithms together: e.g. AIRL/GAIL are almost identical except for the parametric form of reward network. But I agree the way they (and BC/DAgger) are currently merged is pretty ad-hoc.

Before we do a complete rewrite of the scripts I do think it's worth reflecting whether we want to keep building on top of Sacred. I do like it but the project is sadly basically abandonware at this point.

@ernestum
Copy link
Collaborator

Good point, that each Experiment is actually an Ingredient. I will double-check where it makes sense to turn algorithms to ingredients. It would be pointless to have e.g. a DAggerExperiment who only as a single DAggerIngredient.

IMO the consistent interface should result from a consistent design of the ingredients and their configuration. The scripts just pull together the ingredients to form experiments. They allow to run an experiment with the given ingredients but also to explore what configuration values are needed for it. When I run python -m imitation.scripts.<some_script> print_config I expect it to only show me the configuration specific to what <some_script> does. E.g. right now we use the train_imitation to train BC as well as DAgger. If I am only interested in training BC, then I don't want to be bothered with the configuration values for DAgger in the print_config output.

Throwing together AIRL and GAIL eventually resulted in some weird design choices (see comments in above table). It is tempting to fuse them since they are so similar but I feel like the decision to generalize here (which is a good idea on its own) required so many hacks that the gained elegance/reduction of redundancy was outweighed.

I totally agree that we should review the usage of Sacred before we put too much time into refactoring the scripts.
Why do you believe that Sacred is abandoned? When I look just at their releases page it does not seem too bad.
Alternatives to consider are Guild AI and MLFlow. After a quick glance Guild AI looks more promising but we should investigate this more in-depth.

@AdamGleave
Copy link
Member Author

E.g. right now we use the train_imitation to train BC as well as DAgger. If I am only interested in training BC, then I don't want to be bothered with the configuration values for DAgger in the print_config output.

That's a good point, I think I'm convinced it's worth splitting up scripts, so long as we can avoid substantial code duplication by splitting things into ingredients.

Throwing together AIRL and GAIL eventually resulted in some weird design choices (see comments in above table). It is tempting to fuse them since they are so similar but I feel like the decision to generalize here (which is a good idea on its own) required so many hacks that the gained elegance/reduction of redundancy was outweighed.

I agree there's a lot of hacks. I still feel a bit uncertain about this, probably easier to discuss with a concrete idea of how to split them up and what can be shared via ingredients etc. It would be nice to have an easy way to use shared configs between AIRL/GAIL as the right hyperparameters are often quite similar, although this use case is less common now we're using automatically tuned hyperparameters (the configs saved in benchmarks/).

I totally agree that we should review the usage of Sacred before we put too much time into refactoring the scripts. Why do you believe that Sacred is abandoned? When I look just at their releases page it does not seem too bad. Alternatives to consider are Guild AI and MLFlow. After a quick glance Guild AI looks more promising but we should investigate this more in-depth.

Abandoned was too strong a word. The original maintainers of Sacred stepped back from the project and did not handover cleanly. Currently @thequilo has been doing a heroic job reviewing and fixing outstanding issues, and things are improving now that he's got access to make PyPI releases. However, the situation is a bit precarious: if @thequilo leaves there's no ones, and he still doesn't have full permissions on the project. IDSIA/sacred#871 (comment) and IDSIA/sacred#879 have some relevant discussion.

Thanks for looking into alternatives. From a quick skim of Guild AI's docs agree it seems promising, though some design decisions I feel hesitant about (e.g. treating global constants as flags by default...)

@thequilo
Copy link

@AdamGleave is not wrong, there is a risk of sacred becoming unmaintained when I step out. As I am currently a Ph.D. student, and it's completely unclear what will happen after I finished my Ph.D., this might happen in the not-so-far future. But, since my lab is heavily using sacred for all experiment tracking, I'm sure we'll find someone from my lab to continue to maintain sacred, given that Qwlouse gives us access.

@AdamGleave
Copy link
Member Author

Thanks for clarifying @thequilo -- good to know someone else in your lab is likely to pick it up. And good luck wrapping up your PhD!

@ernestum
Copy link
Collaborator

I had a deeper look at Guild AI, ML Flow and neptune.ai. They all are mostly geared to logging, analysis and reproducibility. In comparison to sacred they lack the (hierarchical) configuration management system with experiments and (reusable) ingredients.

I think we have two options here:
(would be interested in your opinion @AdamGleave ? I would lean towards option 1)

1. Stick with sacred

... and hope that it stays maintained. Maybe put some of our time into maintaining it if necessary, maybe push the rl-baselines-zoo to also use sacred.

Pro

  • avoids lots of code duplication
  • easy to add new parameters ad-hoc (no need to change config files for that)

Con

  • we depend on sacred and it is unclear for how long it will be maintained
  • the nested nature of experiments/ingredients invites for (often unnecessary) complexity in the experiment scripts which might sometimes outweigh the benefits of reduced code duplication

2. Use any of the other frameworks

... (probably Guild AI) and use their configuration system (or something else like argparse/click)

Pro

  • We don't depend on sacred (if using Guild AI we even minimize dependency to one particular framework)
  • Other tools support things like HP tuning/grid runs out of the box (no need to maintain our own parallel.py script)

Con

  • We need to rewrite all our scripts
  • Configuration of experiments will become harder (we need to do everything with argparse/click/global variables and there is no support for nested config scopes)

@AdamGleave
Copy link
Member Author

I'll try to look at this next week. @qxcv any thoughts on above, I think you've used Sacred a fair bit?

@qxcv
Copy link
Member

qxcv commented Mar 16, 2023

I don't have a strong opinion on MLFlow/Guild AI/neptune.ai. I have used and like click (for simple stuff), especially over argparse. As for Sacred, I only used it for the EIRLI project. I've since moved away from using Sacred because (1) I only needed config parsing, and not the more sophisticated features (like tracking git revisions, logging to S3/wandb/whatever, etc.), (2) I couldn't get the configs-as-functions stuff to play well with type checking, and (3) I found argument parsing/option override behaviour hard to reason about.

My current go-to is Hydra with structured configs. Structured configs mean that your config is just a dataclass. This was an improvement for me, but still has shortcomings, like poor support for union types and a slightly awkward API for turning on/off "magic" features. Also it may not do some of the things you want if you're already using all the features in Sacred. It has 7k GitHub stars & is used/maintained by Facebook, though, so I expect that it will get better over time. It also has a surprising number of plugins/integrations (submitit/Slurm, Ray, etc.).

@ernestum
Copy link
Collaborator

I think we mostly use sacred for the configuration management and then W&B for logging? @AdamGleave you probably know better what other features of sacred are used but in the code it is mostly about configuration.

I will have a look at Hydra thanks for the hint! What made you move from click to Hydra?

@qxcv
Copy link
Member

qxcv commented Mar 17, 2023

What made you move from click to Hydra?

Mostly the fact that I could declare my config as a (nested) dataclass & pass around the appropriate dataclass for each part of the code. Doing that with Click would require a lot of code duplication (repeating arguments as both Click options and dataclass members) and some extra parsing magic (to handle the nested options).

The Sacred-like experiment management features in Hydra (automatically creating experiment directories, automatically setting up the logging module) are sometimes mildly convenient, but honestly something like this library would have solved my problems just as well.

@ernestum
Copy link
Collaborator

ernestum commented Mar 20, 2023

I think the idea of putting configs in data-classes is interesting.
I could implement a draft of a port to Hydra for one of the scripts to explore any unforeseeable difficulties coming up.
What do you think @AdamGleave ?

Edit: looks like Hydra would let us get rid of the parallel.py script!

@AdamGleave
Copy link
Member Author

I'm in favor of trying to port one script to Hydra to try it out.

Eliminating parallel.py would be nice, I wrote that as a temporary hacky solution and somehow it's stuck around.

One feature Hydra seems to be missing is logging Git commit hash and other things needed for reproducibility. That said, seems like one could add that via callbacks, and WandB already does that so I don't think we need our config framework to do it.

@ernestum ernestum mentioned this issue Apr 14, 2023
@ernestum
Copy link
Collaborator

Feel free to have a look at #703 to see how it turned out.

@qxcv
Copy link
Member

qxcv commented Apr 26, 2023

Wow, thanks for all that work converting things to Hydra! It's using a lot of patterns that I didn't even know existed, like instantiating objects with _target_. I haven't reviewed #703 in detail, but the PR seems very cool from just a skim.

@ernestum ernestum added this to the Release v1 milestone May 24, 2023
@ernestum ernestum modified the milestones: Release v1.0, Release v1.x Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants