[HPU] [Train] Add a Stable Diffusion fine-tuning and serving example #45217

kira-lin · 2024-05-09T06:19:01Z

Why are these changes needed?

This PR adds an example for stable diffusion model fine-tuning and serving using HPU. Moreover, it also covers how to adapt an existing HPU example to run on Ray, so that users can use Ray to run the examples on huggingface/optimum-habana.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

aslonnie

(for train/tune team to review)

woshiyyya · 2024-05-10T20:48:35Z

doc/source/train/examples/intel_gaudi/sd.ipynb

+    "    main()\n",
+    "```\n",
+    "\n",
+    "Originally, this script will be started by MPI if multiple workers are used. But with Ray, we should setup TorchTrainer and supply a main function, which is `main()` in this example.\n",


I love these step-by-step explanations! They are crystal clear:)

doc/source/train/examples/intel_gaudi/sd.ipynb

woshiyyya · 2024-05-10T21:21:58Z

Let's also add this example to doc/source/train/examples.yml. You can refer to this PR: https://github.com/ray-project/ray/pull/44667/files.

Also, could you include the required libraries with versions, so that users can reproduce on their own?

doc/source/train/examples/intel_gaudi/gaudi_stable_diffusion_ouput.png

woshiyyya

Thanks for the contribution @kira-lin ! Generally looks good to me. Left some comments.

doc/source/train/examples.yml

doc/source/train/examples/intel_gaudi/sd.ipynb

Signed-off-by: Zhi Lin <[email protected]>

woshiyyya

LGTM. Please adopt the suggested changes to pass the doc build.

doc/source/train/examples.yml

doc/source/train/examples/intel_gaudi/sd.ipynb

Signed-off-by: Yunxuan Xiao <[email protected]>

…ion-hpu Signed-off-by: Zhi Lin <[email protected]>

woshiyyya · 2024-05-16T20:09:19Z

Seems to have a syntax error in the notebook. Can you fix that?

peytondmurray

A couple of things:

You start sd.ipynb with (hpu_sd_finetune)=. I'm not sure what is intended here, but this tells Sphinx to create a new anchor at the top of the page. IMO this is clutter, because sphinx already lets you reference the file itself automatically: instead of referencing hpu_sd_finetune, just make a reference to train/examples/intel_gaudi/sd and sphinx will create a link to the file. Users who click on the link will be taken to the documentation built from this file: docs.ray.io/en/master/train/examples/intel_gaudi/sd.html; your current approach will create a link to docs.ray.io/en/master/train/examples/intel_gaudi/sd.html#hpu_sd_finetune, with the anchor being at the top of the page. Please see the sphinx docs for more information about references.
Any reason this needs to be invoked in the shell (i.e. via !<command>?

!python ~/optimum-habana/examples/stable-diffusion/training/textual_inversion.py \
  --pretrained_model_name_or_path runwayml/stable-diffusion-v1-5 \
  --train_data_dir "/root/cat" \
  --learnable_property object \
  --placeholder_token "<cat-toy>" \
  --initializer_token toy \
  --resolution 512 \
  --train_batch_size 4 \
  --max_train_steps 3000 \
  --learning_rate 5.0e-04 \
  --scale_lr \
  --lr_scheduler constant \
  --lr_warmup_steps 0 \
  --output_dir /tmp/textual_inversion_cat \
  --save_as_full_pipeline \
  --gaudi_config_name Habana/stable-diffusion \
  --throughput_warmup_steps 3

!python ~/... requires the user to be running something bash-like (to have ~ expansion - not sure if this works on windows). You might want to use a different cell magic here, which might solve the lexing issue as well.

The other option that might be able to help here is to set the cell metadata - there may be a way to override the way this is rendered with myst_nb.
Another option would be to put the options on one line, although this does have the impact of ruining the nice spacing that you have here.
The cell output is huge, 3000 lines of terminal progress bars. We should probably not have this appear in our documentation. You can remove cell output by setting cell metadata - see https://myst-nb.readthedocs.io/en/latest/render/format_code_cells.html.

Alternatively, we probably want to make use of nbstripout-fast as part of pre-commit hooks, but maybe this is a separate issue.

woshiyyya · 2024-05-17T18:06:14Z

Hi @peytondmurray , it's me that added this (hpu_sd_finetune)= tag following our old doc style. Sorry for the confusion, we can remove it if it's no longer applicable now.

For (5), @kira-lin let's remove the outputs and only show the result object in a new cell.

Signed-off-by: Zhi Lin <[email protected]>

woshiyyya

Triggered a premerge run!

Also cc @justinvyu to take a look and merge it?

kira-lin · 2024-05-29T01:06:36Z

ping @woshiyyya @justinvyu , can we merge this?

woshiyyya · 2024-05-29T18:04:08Z

Hi @peytondmurray , can you take a look again and see if the change requests can be resolved?

peytondmurray

Just the one remaining request to remove the extra anchor. Once that's removed, 🚢

doc/source/train/examples/intel_gaudi/sd.ipynb

Co-authored-by: Peyton Murray <[email protected]> Signed-off-by: Samuel Chan <[email protected]>

…ay-project#45217) This PR adds an example for stable diffusion model fine-tuning and serving using HPU. Moreover, it also covers how to adapt an existing HPU example to run on Ray, so that users can use Ray to run the examples on huggingface/optimum-habana. --------- Signed-off-by: Zhi Lin <[email protected]> Signed-off-by: Yunxuan Xiao <[email protected]> Signed-off-by: Samuel Chan <[email protected]> Co-authored-by: Yunxuan Xiao <[email protected]> Co-authored-by: Yunxuan Xiao <[email protected]> Co-authored-by: Samuel Chan <[email protected]> Co-authored-by: Peyton Murray <[email protected]> Signed-off-by: Richard Liu <[email protected]>

kira-lin requested review from matthewdeng, justinvyu, woshiyyya and a team as code owners May 9, 2024 06:19

stable diffusion notebook

2bf3f57

aslonnie reviewed May 9, 2024

View reviewed changes

anyscalesam added triage Needs triage (eg: priority, bug/not-bug, and owning component) train Ray Train Related Issue labels May 9, 2024

woshiyyya self-assigned this May 9, 2024

woshiyyya removed the triage Needs triage (eg: priority, bug/not-bug, and owning component) label May 9, 2024

woshiyyya reviewed May 10, 2024

View reviewed changes

doc/source/train/examples/intel_gaudi/gaudi_stable_diffusion_ouput.png Outdated Show resolved Hide resolved

woshiyyya requested changes May 10, 2024

View reviewed changes

carsonwang reviewed May 14, 2024

View reviewed changes

doc/source/train/examples.yml Outdated Show resolved Hide resolved

carsonwang reviewed May 14, 2024

View reviewed changes

doc/source/train/examples/intel_gaudi/sd.ipynb Outdated Show resolved Hide resolved

kira-lin added 2 commits May 14, 2024 13:41

address comments: add to examples.yml

744380a

Signed-off-by: Zhi Lin <[email protected]>

change to Intel Gaudi

7bde00e

Signed-off-by: Zhi Lin <[email protected]>

woshiyyya requested changes May 14, 2024

View reviewed changes

doc/source/train/examples.yml Show resolved Hide resolved

doc/source/train/examples/intel_gaudi/sd.ipynb Show resolved Hide resolved

doc/source/train/examples/intel_gaudi/sd.ipynb Show resolved Hide resolved

woshiyyya and others added 2 commits May 14, 2024 15:55

Apply suggestions from code review

204feff

Signed-off-by: Yunxuan Xiao <[email protected]>

Merge remote-tracking branch 'upstream/master' into add-stable-diffus…

5e19413

…ion-hpu Signed-off-by: Zhi Lin <[email protected]>

peytondmurray requested changes May 17, 2024

View reviewed changes

kira-lin and others added 2 commits May 20, 2024 15:01

put output in markdown cell

4736b3d

Signed-off-by: Zhi Lin <[email protected]>

Merge branch 'master' into add-stable-diffusion-hpu

8b704bb

kira-lin requested a review from woshiyyya May 22, 2024 01:36

woshiyyya added the go add ONLY when ready to merge, run all tests label May 22, 2024

woshiyyya assigned justinvyu May 22, 2024

woshiyyya approved these changes May 22, 2024

View reviewed changes

anyscalesam added the P1 Issue that should be fixed within a few weeks label May 29, 2024

Merge branch 'master' into add-stable-diffusion-hpu

7929feb

anyscalesam requested review from hongpeng-guo and raulchen as code owners May 29, 2024 05:55

peytondmurray approved these changes May 29, 2024

View reviewed changes

doc/source/train/examples/intel_gaudi/sd.ipynb Outdated Show resolved Hide resolved

jerome-habana approved these changes May 30, 2024

View reviewed changes

anyscalesam and others added 2 commits May 30, 2024 11:20

Update doc/source/train/examples/intel_gaudi/sd.ipynb

9923722

Co-authored-by: Peyton Murray <[email protected]> Signed-off-by: Samuel Chan <[email protected]>

Merge branch 'master' into add-stable-diffusion-hpu

7a7d5d3

anyscalesam enabled auto-merge (squash) May 30, 2024 18:20

github-actions bot disabled auto-merge May 30, 2024 18:20

justinvyu merged commit 18eb433 into ray-project:master May 30, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HPU] [Train] Add a Stable Diffusion fine-tuning and serving example #45217

[HPU] [Train] Add a Stable Diffusion fine-tuning and serving example #45217

kira-lin commented May 9, 2024

aslonnie left a comment

woshiyyya May 10, 2024

woshiyyya commented May 10, 2024 •

edited

Loading

woshiyyya left a comment

woshiyyya left a comment

woshiyyya commented May 16, 2024

peytondmurray left a comment •

edited

Loading

woshiyyya commented May 17, 2024 •

edited

Loading

woshiyyya left a comment

kira-lin commented May 29, 2024

woshiyyya commented May 29, 2024

peytondmurray left a comment

[HPU] [Train] Add a Stable Diffusion fine-tuning and serving example #45217

[HPU] [Train] Add a Stable Diffusion fine-tuning and serving example #45217

Conversation

kira-lin commented May 9, 2024

Why are these changes needed?

Related issue number

Checks

aslonnie left a comment

Choose a reason for hiding this comment

woshiyyya May 10, 2024

Choose a reason for hiding this comment

woshiyyya commented May 10, 2024 • edited Loading

woshiyyya left a comment

Choose a reason for hiding this comment

woshiyyya left a comment

Choose a reason for hiding this comment

woshiyyya commented May 16, 2024

peytondmurray left a comment • edited Loading

Choose a reason for hiding this comment

woshiyyya commented May 17, 2024 • edited Loading

woshiyyya left a comment

Choose a reason for hiding this comment

kira-lin commented May 29, 2024

woshiyyya commented May 29, 2024

peytondmurray left a comment

Choose a reason for hiding this comment

woshiyyya commented May 10, 2024 •

edited

Loading

peytondmurray left a comment •

edited

Loading

woshiyyya commented May 17, 2024 •

edited

Loading