-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HPU] [Train] Add a Stable Diffusion fine-tuning and serving example #45217
[HPU] [Train] Add a Stable Diffusion fine-tuning and serving example #45217
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(for train/tune team to review)
" main()\n", | ||
"```\n", | ||
"\n", | ||
"Originally, this script will be started by MPI if multiple workers are used. But with Ray, we should setup TorchTrainer and supply a main function, which is `main()` in this example.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love these step-by-step explanations! They are crystal clear:)
Let's also add this example to Also, could you include the required libraries with versions, so that users can reproduce on their own? |
doc/source/train/examples/intel_gaudi/gaudi_stable_diffusion_ouput.png
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @kira-lin ! Generally looks good to me. Left some comments.
Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Zhi Lin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please adopt the suggested changes to pass the doc build.
Signed-off-by: Yunxuan Xiao <[email protected]>
…ion-hpu Signed-off-by: Zhi Lin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of things:
-
You start
sd.ipynb
with(hpu_sd_finetune)=
. I'm not sure what is intended here, but this tells Sphinx to create a new anchor at the top of the page. IMO this is clutter, because sphinx already lets you reference the file itself automatically: instead of referencinghpu_sd_finetune
, just make a reference totrain/examples/intel_gaudi/sd
and sphinx will create a link to the file. Users who click on the link will be taken to the documentation built from this file:docs.ray.io/en/master/train/examples/intel_gaudi/sd.html
; your current approach will create a link todocs.ray.io/en/master/train/examples/intel_gaudi/sd.html#hpu_sd_finetune
, with the anchor being at the top of the page. Please see the sphinx docs for more information about references. -
Any reason this needs to be invoked in the shell (i.e. via
!<command>
?
!python ~/optimum-habana/examples/stable-diffusion/training/textual_inversion.py \
--pretrained_model_name_or_path runwayml/stable-diffusion-v1-5 \
--train_data_dir "/root/cat" \
--learnable_property object \
--placeholder_token "<cat-toy>" \
--initializer_token toy \
--resolution 512 \
--train_batch_size 4 \
--max_train_steps 3000 \
--learning_rate 5.0e-04 \
--scale_lr \
--lr_scheduler constant \
--lr_warmup_steps 0 \
--output_dir /tmp/textual_inversion_cat \
--save_as_full_pipeline \
--gaudi_config_name Habana/stable-diffusion \
--throughput_warmup_steps 3
!python ~/...
requires the user to be running something bash-like (to have ~
expansion - not sure if this works on windows). You might want to use a different cell magic here, which might solve the lexing issue as well.
-
The other option that might be able to help here is to set the cell metadata - there may be a way to override the way this is rendered with
myst_nb
. -
Another option would be to put the options on one line, although this does have the impact of ruining the nice spacing that you have here.
-
The cell output is huge, 3000 lines of terminal progress bars. We should probably not have this appear in our documentation. You can remove cell output by setting cell metadata - see https://myst-nb.readthedocs.io/en/latest/render/format_code_cells.html.
Alternatively, we probably want to make use of nbstripout-fast
as part of pre-commit hooks, but maybe this is a separate issue.
Hi @peytondmurray , it's me that added this For (5), @kira-lin let's remove the outputs and only show the result object in a new cell. |
Signed-off-by: Zhi Lin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Triggered a premerge run!
Also cc @justinvyu to take a look and merge it?
ping @woshiyyya @justinvyu , can we merge this? |
Hi @peytondmurray , can you take a look again and see if the change requests can be resolved? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the one remaining request to remove the extra anchor. Once that's removed, 🚢
Co-authored-by: Peyton Murray <[email protected]> Signed-off-by: Samuel Chan <[email protected]>
…ay-project#45217) This PR adds an example for stable diffusion model fine-tuning and serving using HPU. Moreover, it also covers how to adapt an existing HPU example to run on Ray, so that users can use Ray to run the examples on huggingface/optimum-habana. --------- Signed-off-by: Zhi Lin <[email protected]> Signed-off-by: Yunxuan Xiao <[email protected]> Signed-off-by: Samuel Chan <[email protected]> Co-authored-by: Yunxuan Xiao <[email protected]> Co-authored-by: Yunxuan Xiao <[email protected]> Co-authored-by: Samuel Chan <[email protected]> Co-authored-by: Peyton Murray <[email protected]> Signed-off-by: Richard Liu <[email protected]>
Why are these changes needed?
This PR adds an example for stable diffusion model fine-tuning and serving using HPU. Moreover, it also covers how to adapt an existing HPU example to run on Ray, so that users can use Ray to run the examples on huggingface/optimum-habana.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.