Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

๐ŸŒ [i18n-KO] Translated fsdp.md to Korean #32261

Merged
merged 8 commits into from
Aug 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/ko/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -170,8 +170,8 @@
title: (๋ฒˆ์—ญ์ค‘) Methods and tools for efficient training on a single GPU
- local: perf_train_gpu_many
title: ๋‹ค์ค‘ GPU์—์„œ ํ›ˆ๋ จ ์ง„ํ–‰ํ•˜๊ธฐ
- local: in_translation
title: (๋ฒˆ์—ญ์ค‘) Fully Sharded Data Parallel
- local: fsdp
title: ์™„์ „ ๋ถ„ํ•  ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ
- local: in_translation
title: (๋ฒˆ์—ญ์ค‘) DeepSpeed
- local: perf_train_cpu
Expand Down
138 changes: 138 additions & 0 deletions docs/source/ko/fsdp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

โš ๏ธ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# ์™„์ „ ๋ถ„ํ•  ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ(FSDP) [[fully-sharded-data-parallel]]

[Fully Sharded Data Parallel (FSDP)](https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/)์€ ๋ชจ๋ธ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜, ๊ทธ๋ ˆ์ด๋””์–ธํŠธ ๋ฐ ์˜ตํ‹ฐ๋งˆ์ด์ € ์ƒํƒœ๋ฅผ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ GPU(์ž‘์—…์ž ๋˜๋Š” *๋žญํฌ*๋ผ๊ณ ๋„ ํ•จ) ์ˆ˜์— ๋”ฐ๋ผ ๋ถ„ํ• ํ•˜๋Š” ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. [DistributedDataParallel (DDP)](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html)์™€ ๋‹ฌ๋ฆฌ, FSDP๋Š” ๊ฐ GPU์— ๋ชจ๋ธ์„ ๋ณต์ œํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ž…๋‹ˆ๋‹ค. ์ด๋Š” GPU ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ค๋ฉฐ ์ ์€ ์ˆ˜์˜ GPU๋กœ ํ›จ์”ฌ ๋” ํฐ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. FSDP๋Š” ๋ถ„์‚ฐ ํ™˜๊ฒฝ์—์„œ์˜ ํ›ˆ๋ จ์„ ์‰ฝ๊ฒŒ ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ธ Accelerate์™€ ํ†ตํ•ฉ๋˜์–ด ์žˆ์œผ๋ฉฐ, ๋”ฐ๋ผ์„œ [`Trainer`] ํด๋ž˜์Šค์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‹œ์ž‘ํ•˜๊ธฐ ์ „์— Accelerate๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๊ณ  ์ตœ์†Œ PyTorch 2.1.0 ์ด์ƒ์˜ ๋ฒ„์ „์ด ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”.

```bash
pip install accelerate
```

## FSDP ๊ตฌ์„ฑ [[fsdp-configuration]]

์‹œ์ž‘ํ•˜๋ ค๋ฉด [`accelerate config`](https://huggingface.co/docs/accelerate/package_reference/cli#accelerate-config) ๋ช…๋ น์„ ์‹คํ–‰ํ•˜์—ฌ ํ›ˆ๋ จ ํ™˜๊ฒฝ์— ๋Œ€ํ•œ ๊ตฌ์„ฑ ํŒŒ์ผ์„ ์ƒ์„ฑํ•˜์„ธ์š”. Accelerate๋Š” ์ด ๊ตฌ์„ฑ ํŒŒ์ผ์„ ์‚ฌ์šฉํ•˜์—ฌ `accelerate config`์—์„œ ์„ ํƒํ•œ ํ›ˆ๋ จ ์˜ต์…˜์— ๋”ฐ๋ผ ์ž๋™์œผ๋กœ ์˜ฌ๋ฐ”๋ฅธ ํ›ˆ๋ จ ํ™˜๊ฒฝ์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

```bash
accelerate config
```

`accelerate config`๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ํ›ˆ๋ จ ํ™˜๊ฒฝ์„ ๊ตฌ์„ฑํ•˜๊ธฐ ์œ„ํ•œ ์ผ๋ จ์˜ ์˜ต์…˜๋“ค์ด ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค. ์ด ์„น์…˜์—์„œ๋Š” ๊ฐ€์žฅ ์ค‘์š”ํ•œ FSDP ์˜ต์…˜ ์ค‘ ์ผ๋ถ€๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ FSDP ์˜ต์…˜์— ๋Œ€ํ•ด ๋” ์•Œ์•„๋ณด๊ณ  ์‹ถ๋‹ค๋ฉด [fsdp_config](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.fsdp_config) ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

### ๋ถ„ํ•  ์ „๋žต [[sharding-strategy]]

FSDP๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ถ„ํ•  ์ „๋žต์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

* `FULL_SHARD` - ๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜, ๊ทธ๋ ˆ์ด๋””์–ธํŠธ ๋ฐ ์˜ตํ‹ฐ๋งˆ์ด์ € ์ƒํƒœ๋ฅผ ์ž‘์—…์ž ๊ฐ„์— ๋ถ„ํ• ; ์ด ์˜ต์…˜์„ ์„ ํƒํ•˜๋ ค๋ฉด `1`์„ ์„ ํƒํ•˜์„ธ์š”
* `SHARD_GRAD_OP` - ๊ทธ๋ ˆ์ด๋””์–ธํŠธ ๋ฐ ์˜ตํ‹ฐ๋งˆ์ด์ € ์ƒํƒœ๋ฅผ ์ž‘์—…์ž ๊ฐ„์— ๋ถ„ํ• ; ์ด ์˜ต์…˜์„ ์„ ํƒํ•˜๋ ค๋ฉด `2`๋ฅผ ์„ ํƒํ•˜์„ธ์š”
* `NO_SHARD` - ์•„๋ฌด ๊ฒƒ๋„ ๋ถ„ํ• ํ•˜์ง€ ์•Š์Œ (DDP์™€ ๋™์ผ); ์ด ์˜ต์…˜์„ ์„ ํƒํ•˜๋ ค๋ฉด `3`์„ ์„ ํƒํ•˜์„ธ์š”
* `HYBRID_SHARD` - ๊ฐ ์ž‘์—…์ž๊ฐ€ ์ „์ฒด ๋ณต์‚ฌ๋ณธ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์ƒํƒœ์—์„œ ๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜, ๊ทธ๋ ˆ์ด๋””์–ธํŠธ ๋ฐ ์˜ตํ‹ฐ๋งˆ์ด์ € ์ƒํƒœ๋ฅผ ์ž‘์—…์ž ๋‚ด์—์„œ ๋ถ„ํ• ; ์ด ์˜ต์…˜์„ ์„ ํƒํ•˜๋ ค๋ฉด `4`๋ฅผ ์„ ํƒํ•˜์„ธ์š”
* `HYBRID_SHARD_ZERO2` - ๊ฐ ์ž‘์—…์ž๊ฐ€ ์ „์ฒด ๋ณต์‚ฌ๋ณธ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์ƒํƒœ์—์„œ ๊ทธ๋ ˆ์ด๋””์–ธํŠธ ๋ฐ ์˜ตํ‹ฐ๋งˆ์ด์ € ์ƒํƒœ๋ฅผ ์ž‘์—…์ž ๋‚ด์—์„œ ๋ถ„ํ• ; ์ด ์˜ต์…˜์„ ์„ ํƒํ•˜๋ ค๋ฉด `5`๋ฅผ ์„ ํƒํ•˜์„ธ์š”

์ด๊ฒƒ์€ `fsdp_sharding_strategy` ํ”Œ๋ž˜๊ทธ๋กœ ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค.

### CPU ์˜คํ”„๋กœ๋“œ [[cpu-offload]]

์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜์™€ ๊ทธ๋ ˆ์ด๋””์–ธํŠธ๋ฅผ CPU๋กœ ์˜คํ”„๋กœ๋“œํ•˜์—ฌ ๋” ๋งŽ์€ GPU ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ ˆ์•ฝํ•˜๊ณ  FSDP๋กœ๋„ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์€ ํฐ ๋ชจ๋ธ์„ GPU์— ์ ์žฌํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” `accelerate config`๋ฅผ ์‹คํ–‰ํ•  ๋•Œ `fsdp_offload_params: true`๋กœ ์„ค์ •ํ•˜์—ฌ ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค.

### ๋ž˜ํ•‘ ์ •์ฑ… [[wrapping-policy]]

FSDP๋Š” ๋„คํŠธ์›Œํฌ์˜ ๊ฐ ๋ ˆ์ด์–ด๋ฅผ ๋ž˜ํ•‘ํ•˜์—ฌ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋ž˜ํ•‘์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์ค‘์ฒฉ ๋ฐฉ์‹์œผ๋กœ ์ ์šฉ๋˜๋ฉฐ ๊ฐ๊ฐ ์ˆœ๋ฐฉํ–ฅ์œผ๋กœ ์ง€๋‚˜๊ฐ„ ํ›„ ์ „์ฒด ๊ฐ€์ค‘์น˜๋ฅผ ์‚ญ์ œํ•˜์—ฌ ๋‹ค์Œ ๋ ˆ์ด์–ด์—์„œ ์‚ฌ์šฉํ•  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ ˆ์•ฝํ•ฉ๋‹ˆ๋‹ค. *์ž๋™ ๋ž˜ํ•‘* ์ •์ฑ…์€ ์ด๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์ด๋ฉฐ ์ฝ”๋“œ๋ฅผ ๋ณ€๊ฒฝํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. Transformer ๋ ˆ์ด์–ด๋ฅผ ๋ž˜ํ•‘ํ•˜๋ ค๋ฉด `fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP`๋ฅผ ์„ ํƒํ•˜๊ณ  ๋ž˜ํ•‘ํ•  ๋ ˆ์ด์–ด๋ฅผ ์ง€์ •ํ•˜๋ ค๋ฉด `fsdp_transformer_layer_cls_to_wrap`๋ฅผ ์„ ํƒํ•˜์„ธ์š” (์˜ˆ: `BertLayer`).

๋˜๋Š” ํŠน์ • ๋งค๊ฐœ๋ณ€์ˆ˜ ์ˆ˜๋ฅผ ์ดˆ๊ณผํ•  ๊ฒฝ์šฐ FSDP๊ฐ€ ๋ ˆ์ด์–ด์— ์ ์šฉ๋˜๋Š” ํฌ๊ธฐ ๊ธฐ๋ฐ˜ ๋ž˜ํ•‘ ์ •์ฑ…์„ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” `fsdp_wrap_policy: SIZE_BASED_WRAP` ๋ฐ `min_num_param`์„ ์›ํ•˜๋Š” ํฌ๊ธฐ์˜ ์ž„๊ณ„๊ฐ’์œผ๋กœ ์„ค์ •ํ•˜์—ฌ ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค.

### ์ฒดํฌํฌ์ธํŠธ [[checkpointing]]

์ค‘๊ฐ„ ์ฒดํฌํฌ์ธํŠธ๋Š” `fsdp_state_dict_type: SHARDED_STATE_DICT`๋กœ ์ €์žฅํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. CPU ์˜คํ”„๋กœ๋“œ๊ฐ€ ํ™œ์„ฑํ™”๋œ ๋žญํฌ 0์—์„œ ์ „์ฒด ์ƒํƒœ ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ์ €์žฅํ•˜๋Š” ๋ฐ ์‹œ๊ฐ„์ด ๋งŽ์ด ๊ฑธ๋ฆฌ๊ณ , ๋ธŒ๋กœ๋“œ์บ์ŠคํŒ… ์ค‘ ๋ฌด๊ธฐํ•œ ๋Œ€๊ธฐํ•˜์—ฌ `NCCL Timeout` ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. [`~accelerate.Accelerator.load_state`] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ถ„ํ• ๋œ ์ƒํƒœ ๋”•์…”๋„ˆ๋ฆฌ๋กœ ํ›ˆ๋ จ์„ ์žฌ๊ฐœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

```py
# ๊ฒฝ๋กœ๊ฐ€ ๋‚ด์žฌ๋œ ์ฒดํฌํฌ์ธํŠธ
accelerator.load_state("ckpt")
```

๊ทธ๋Ÿฌ๋‚˜ ํ›ˆ๋ จ์ด ๋๋‚˜๋ฉด ์ „์ฒด ์ƒํƒœ ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ์ €์žฅํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋ถ„ํ• ๋œ ์ƒํƒœ ๋”•์…”๋„ˆ๋ฆฌ๋Š” FSDP์™€๋งŒ ํ˜ธํ™˜๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

```py
if trainer.is_fsdp_enabled:
trainer.accelerator.state.fsdp_plugin.set_state_dict_type("FULL_STATE_DICT")

trainer.save_model(script_args.output_dir)
```

### TPU [[tpu]]

[PyTorch XLA](https://pytorch.org/xla/release/2.1/index.html)๋Š” TPU์— ๋Œ€ํ•œ FSDP ํ›ˆ๋ จ์„ ์ง€์›ํ•˜๋ฉฐ `accelerate config`๋กœ ์ƒ์„ฑ๋œ FSDP ๊ตฌ์„ฑ ํŒŒ์ผ์„ ์ˆ˜์ •ํ•˜์—ฌ ํ™œ์„ฑํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ„์—์„œ ์ง€์ •ํ•œ ๋ถ„ํ•  ์ „๋žต ๋ฐ ๋ž˜ํ•‘ ์˜ต์…˜ ์™ธ์—๋„ ์•„๋ž˜์— ํ‘œ์‹œ๋œ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ํŒŒ์ผ์— ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

```yaml
xla: True # PyTorch/XLA๋ฅผ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด True๋กœ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค
xla_fsdp_settings: # XLA ํŠน์ • FSDP ๋งค๊ฐœ๋ณ€์ˆ˜
xla_fsdp_grad_ckpt: True # gradient checkpointing์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค
```

[`xla_fsdp_settings`](https://github.com/pytorch/xla/blob/2e6e183e0724818f137c8135b34ef273dea33318/torch_xla/distributed/fsdp/xla_fully_sharded_data_parallel.py#L128)๋Š” FSDP์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ XLA ํŠน์ • ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

## ํ›ˆ๋ จ ์‹œ์ž‘ [[launch-training]]

์˜ˆ์‹œ FSDP ๊ตฌ์„ฑ ํŒŒ์ผ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

```yaml
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
fsdp_config:
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_backward_prefetch_policy: BACKWARD_PRE
fsdp_cpu_ram_efficient_loading: true
fsdp_forward_prefetch: false
fsdp_offload_params: true
fsdp_sharding_strategy: 1
fsdp_state_dict_type: SHARDED_STATE_DICT
fsdp_sync_module_states: true
fsdp_transformer_layer_cls_to_wrap: BertLayer
fsdp_use_orig_params: true
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```

ํ›ˆ๋ จ์„ ์‹œ์ž‘ํ•˜๋ ค๋ฉด [`accelerate launch`](https://huggingface.co/docs/accelerate/package_reference/cli#accelerate-launch) ๋ช…๋ น์„ ์‹คํ–‰ํ•˜์„ธ์š”. ์ด ๋•Œ ์ „์— `accelerate config`๋กœ ์ƒ์„ฑํ•œ ๊ตฌ์„ฑ ํŒŒ์ผ์„ ์ž๋™์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

```bash
accelerate launch my-trainer-script.py
```

```bash
accelerate launch --fsdp="full shard" --fsdp_config="path/to/fsdp_config/ my-trainer-script.py
```

## ๋‹ค์Œ ๋‹จ๊ณ„ [[next-steps]]

FSDP๋Š” ๋งค์šฐ ํฐ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•  ๋•Œ ๊ฐ•๋ ฅํ•œ ๋„๊ตฌ๊ฐ€ ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์—ฌ๋Ÿฌ ๊ฐœ์˜ GPU๋‚˜ TPU๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜, ์˜ตํ‹ฐ๋งˆ์ด์ € ๋ฐ ๊ทธ๋ ˆ์ด๋””์–ธํŠธ ์ƒํƒœ๋ฅผ ๋ถ„ํ• ํ•˜๊ณ  ๋น„ํ™œ์„ฑ ์ƒํƒœ์ผ ๋•Œ, CPU๋กœ ์˜คํ”„๋กœ๋“œํ•˜๋ฉด FSDP๋Š” ๋Œ€๊ทœ๋ชจ ํ›ˆ๋ จ์˜ ๋†’์€ ์—ฐ์‚ฐ ๋น„์šฉ์„ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋” ์•Œ์•„๋ณด๊ณ  ์‹ถ๋‹ค๋ฉด ๋‹ค์Œ ์ž๋ฃŒ๊ฐ€ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

* [FSDP](https://huggingface.co/docs/accelerate/usage_guides/fsdp)์— ๋Œ€ํ•œ ๋” ๊นŠ์ด ์žˆ๋Š” Accelerate ๊ฐ€์ด๋“œ๋ฅผ ๋”ฐ๋ผ๊ฐ€ ๋ณด์„ธ์š”.
* [PyTorch์˜ ์™„์ „ ๋ถ„ํ•  ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ (FSDP) API๋ฅผ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค](https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/) ๋ธ”๋กœ๊ทธ ๊ธ€์„ ์ฝ์–ด๋ณด์„ธ์š”.
* [FSDP๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํด๋ผ์šฐ๋“œ TPU์—์„œ PyTorch ๋ชจ๋ธ ํฌ๊ธฐ ์กฐ์ ˆํ•˜๊ธฐ](https://pytorch.org/blog/scaling-pytorch-models-on-cloud-tpus-with-fsdp/) ๋ธ”๋กœ๊ทธ ๊ธ€์„ ์ฝ์–ด๋ณด์„ธ์š”.
Loading