Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Command-Line Argument for Specifying Wandb Directory #466

Merged
merged 2 commits into from
Jun 21, 2024

Conversation

nqhq-lou
Copy link

Issue:
This PR addresses issue #433, which involves adding a command-line argument to specify the wandb directory.

Changes:

  1. Added --wandb_dir in mace.tools.arg_parser.build_default_arg_parser(). The arg default is None to preserve the default wandb directory settings.
  2. Added keyword directory in mace.tools.torch_tools.init_wandb(). All associated function calls are adapted to the new keyword.

Code behavior:

with --wandb_dir:

  • command:
mace_run_train --name=dbg_MACE_train_11267359_0 \
--log_dir=./runs/dbg_MACE_train_11267359_0 --model_dir=./runs/dbg_MACE_train_11267359_0 \
--checkpoints_dir=./runs/dbg_MACE_train_11267359_0 --results_dir=./runs/dbg_MACE_train_11267359_0 \
... \
--wandb --wandb_dir=./runs/dbg_MACE_train_11267359_0 --wandb_project=[project_name] --wandb_entity=[entity_name] --wandb_name=[wandb_name]
  • directory structure:
.
├── logs
│   ├── dbg_MACE_train.11267359_0.err
│   └── dbg_MACE_train.11267359_0.out
├── runs
│   └── dbg_MACE_train_11267359_0
│       ├── dbg_MACE_train_11267359_0_run-42_epoch-26.pt
│       ├── dbg_MACE_train_11267359_0_run-42.log
│       ├── dbg_MACE_train_11267359_0_run-42_train.txt
│       ├── node_usage.11267359_0.log
│       └── wandb
│           ├── debug-internal.log -> run-20240617_152910-sw6ffqpr/logs/debug-internal.log
│           ├── debug.log -> run-20240617_152910-sw6ffqpr/logs/debug.log
│           ├── latest-run -> run-20240617_152910-sw6ffqpr
│           └── run-20240617_152910-sw6ffqpr
│               ├── files
│               │   ├── conda-environment.yaml
│               │   ├── config.yaml
│               │   ├── output.log
│               │   ├── requirements.txt
│               │   ├── wandb-metadata.json
│               │   └── wandb-summary.json
│               ├── logs
│               │   ├── debug-internal.log
│               │   └── debug.log
│               ├── run-sw6ffqpr.wandb
│               └── tmp
│                   └── code
├── train.sbatch.sh
└── wandb
    └── debug-cli.zklou.log
  • As expected the wandb metadata dir is relocated as specified by --wandb_dir

without --wandb_dir:

  • command:
mace_run_train --name=dbg_MACE_train_11267416_0 \
--log_dir=./runs/dbg_MACE_train_11267416_0 --model_dir=./runs/dbg_MACE_train_11267416_0 \
--checkpoints_dir=./runs/dbg_MACE_train_11267416_0 --results_dir=./runs/dbg_MACE_train_11267416_0 \
... \
--wandb --wandb_project=[project_name] --wandb_entity=[entity_name] --wandb_name=[wandb_name]
  • directory structure:
.
├── logs
│   ├── dbg_MACE_train.11267416_0.err
│   └── dbg_MACE_train.11267416_0.out
├── runs
│   └── dbg_MACE_train_11267416_0
│       ├── dbg_MACE_train_11267416_0_run-42_epoch-60.pt
│       ├── dbg_MACE_train_11267416_0_run-42.log
│       ├── dbg_MACE_train_11267416_0_run-42_train.txt
│       └── node_usage.11267416_0.log
├── train.sbatch.sh
└── wandb
    ├── debug-cli.zklou.log
    ├── debug-internal.log -> run-20240617_153307-sfclv2b3/logs/debug-internal.log
    ├── debug.log -> run-20240617_153307-sfclv2b3/logs/debug.log
    ├── latest-run -> run-20240617_153307-sfclv2b3
    └── run-20240617_153307-sfclv2b3
        ├── files
        │   ├── conda-environment.yaml
        │   ├── config.yaml
        │   ├── output.log
        │   ├── requirements.txt
        │   ├── wandb-metadata.json
        │   └── wandb-summary.json
        ├── logs
        │   ├── debug-internal.log
        │   └── debug.log
        ├── run-sfclv2b3.wandb
        └── tmp
            └── code
  • There is not impact on the default wandb logging directory, ensuring compatibility.

@ilyes319 ilyes319 changed the base branch from main to develop June 17, 2024 14:12
@nqhq-lou
Copy link
Author

@ilyes319 Hi Ilyes, how does this PR look like? For the failed testing I checked the output logs and it tells some package errors. This PR is actually quite simple so I think maybe it would be safe to merge.

@ilyes319
Copy link
Contributor

yep can you just update your fork with the latest develop, it should make the test pass.

@nqhq-lou
Copy link
Author

Just merged branch develop into this branch! Please approve the testing workflow.

@ilyes319 ilyes319 merged commit ed0ff02 into ACEsuit:develop Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants