Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error occur in the resize_embedding #32196

Closed
2 of 4 tasks
Gaiejj opened this issue Jul 24, 2024 · 8 comments
Closed
2 of 4 tasks

error occur in the resize_embedding #32196

Gaiejj opened this issue Jul 24, 2024 · 8 comments
Labels

Comments

@Gaiejj
Copy link

Gaiejj commented Jul 24, 2024

System Info

  • transformers version: 4.43.1
  • Platform: Linux-5.15.0-1040-nvidia-x86_64-with-glibc2.35
  • Python version: 3.11.9
  • Huggingface_hub version: 0.24.1
  • Safetensors version: 0.4.3
  • Accelerate version: 0.33.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA H800

Who can help?

@ArthurZucker When using deepspeed ZeRO3 to train the llama2-7b-hf model, I encountered an error during the resize_embedding process that I couldn't resolve. The llama2-7b-hf tokenizer lacks a pad_token, so I specified a default value for it, which requires resizing the embedding. However, this command executes correctly in transformers version 4.41.2 but fails in version 4.43.0.

I identified the following two anomalies:

  1. Abnormal tensor shape
        params = [embeddings.weight]
        # embeddings.weight.size(0) is 32001 here
        context = (
            deepspeed.zero.GatheredParameters(params, modifier_rank=0)
            if is_deepspeed_zero3_enabled()
            else contextlib.nullcontext()
        )
        with context:
            for param in params:
                if param is None:
                    continue
                assert param.size(0) == new_num_embeddings, f'{param.size(0)}, {new_num_embeddings}'
                # bug here, param size is 32000 while new_num_embeddings is 32001, in 4.43.0 transformers
                param_data = param.data
                param_mean = param_data[:-num_new_embeddings].mean(dim=0, keepdim=True)
                param_data[-num_new_embeddings:] = param_mean
  1. Abnormal ds_id
        params = [embeddings.weight]
        print(hasattr(embeddings.weight, 'ds_id'))
        # True for transformers 4.43.0, False for transformers 4.41.2

I've spent a lot of time pinpointing this issue, but I genuinely don't know how to resolve it. I sincerely hope you can provide assistance. This would be incredibly helpful, and I express my heartfelt gratitude to you.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. The python file:
import torch
import deepspeed
import json

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer
)

from transformers.integrations.deepspeed import HfDeepSpeedConfig


DEFAULT_BOS_TOKEN: str = '<s>'
DEFAULT_EOS_TOKEN: str = '</s>'
DEFAULT_PAD_TOKEN: str = '<pad>'
DEFAULT_UNK_TOKEN: str = '<unk>'

model_name_or_path = 'PATHTO/Llama-2-7b-hf'
ds_cfgs_path = 'PATH'

deepspeed.init_distributed()

with open(ds_cfgs_path) as f:
    ds_cfgs = json.load(f)
    ds_cfgs['bf16']['enabled'] = True

dstchf = HfDeepSpeedConfig(ds_cfgs)

tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path,
    model_max_length=2048,
    padding_side='right',
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
        model_name_or_path,
        torch_dtype=torch.bfloat16,
        trust_remote_code=True,
)

# Reference: https://github.com/tatsu-lab/stanford_alpaca/blob/main/train.py
def resize_tokenizer_embedding(tokenizer, model) -> None:
    """Resize tokenizer and embedding.

    Note: This is the unoptimized version that may make your embedding size not be divisible by 64.
    """
    def init_new_embeddings(
        embeddings,
        new_num_embeddings: int,
        num_new_embeddings: int,
    ) -> None:
        if embeddings is None:
            return

        params = [embeddings.weight]
        print(hasattr(embeddings.weight, 'ds_id'))
        # True for transformers 4.43.1, False for transformers 4.41.2
        exit()
        context = (
            deepspeed.zero.GatheredParameters(params, modifier_rank=0)
            if is_deepspeed_zero3_enabled()
            else contextlib.nullcontext()
        )
        with context:
            for param in params:
                if param is None:
                    continue
                assert param.size(0) == new_num_embeddings, f'{param.size(0)}, {new_num_embeddings}'
                # bug here, param size is 32000 while new_num_embeddings is 32001
                param_data = param.data
                param_mean = param_data[:-num_new_embeddings].mean(dim=0, keepdim=True)
                param_data[-num_new_embeddings:] = param_mean

    special_tokens_dict = {}
    if tokenizer.pad_token is None:
        special_tokens_dict['pad_token'] = DEFAULT_PAD_TOKEN
    if tokenizer.eos_token is None:
        special_tokens_dict['eos_token'] = DEFAULT_EOS_TOKEN
    if tokenizer.bos_token is None:
        special_tokens_dict['bos_token'] = DEFAULT_BOS_TOKEN
    if tokenizer.unk_token is None:
        special_tokens_dict['unk_token'] = DEFAULT_UNK_TOKEN

    num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict)
    new_num_embeddings = len(tokenizer)

    model.config.bos_token_id = tokenizer.bos_token_id
    model.config.eos_token_id = tokenizer.eos_token_id
    model.config.pad_token_id = tokenizer.pad_token_id

    if num_new_tokens > 0:
        hf_device_map = getattr(model, 'hf_device_map', {})
        devices = {
            torch.device(device)
            for device in hf_device_map.values()
            if device not in {'cpu', 'disk'}
        }
        is_model_parallel = len(devices) > 1

        if not is_model_parallel:
            model.resize_token_embeddings(new_num_embeddings)

            init_new_embeddings(
                model.get_input_embeddings(),
                new_num_embeddings=new_num_embeddings,
                num_new_embeddings=num_new_tokens,
            )
            init_new_embeddings(
                model.get_output_embeddings(),
                new_num_embeddings=new_num_embeddings,
                num_new_embeddings=num_new_tokens,
            )
            
resize_tokenizer_embedding(tokenizer=tokenizer, model=model)
  1. The deepspeed start bash
deepspeed \
 --master_port 12345 \
 --module debug.py \
  1. The ds cfgs:
{
  "train_batch_size": 128,
  "train_micro_batch_size_per_gpu": 16,
  "gradient_accumulation_steps": null,
  "steps_per_print": 10,
  "zero_optimization": {
      "stage": 3,
      "offload_param": {
          "device": "none"
      },
      "offload_optimizer": {
          "device": "none"
      },
      "param_persistence_threshold": 1e4,
      "max_live_parameters": 1e8,
      "prefetch_bucket_size": 3e7,
      "memory_efficient_linear": false,
      "gather_16bit_weights_on_model_save": true
  },
  "gradient_clipping": 1.0,
  "prescale_gradients": false,
  "wall_clock_breakdown": false,
  "hybrid_engine": {
      "enabled": false,
      "max_out_tokens": 512,
      "inference_tp_size": 1,
      "release_inference_cache": false,
      "pin_parameters": true,
      "tp_gather_partition_size": 8
  },
  "fp16": {
    "enabled": false,
    "loss_scale": 0,
    "loss_scale_window": 1000,
    "initial_scale_power": 16,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "bf16": {
    "enabled": false
  }
}

Expected behavior

Correctly resizing. Thanks!

@Gaiejj Gaiejj added the bug label Jul 24, 2024
@Gaiejj Gaiejj changed the title erroe occur in the resize_embedding error occur in the resize_embedding Jul 24, 2024
@ArthurZucker
Copy link
Collaborator

Hey! I think #32192 should have fixed it!

@seokhyunan
Copy link

It seems the issue is still not fixed. You can check the progress in #32192.

@Gaiejj
Copy link
Author

Gaiejj commented Jul 25, 2024

Thank you very much for your prompt response and continuous follow-up. I will closely monitor the latest updates. Thanks again for your hard work! ❤️

@seokhyunan
Copy link

This issue is resolved by #32214! Thanks to @zucchini-nlp.

@ArthurZucker
Copy link
Collaborator

On my way to do a patch then! Thanks all for reporting this quickly, and thanks @zucchini-nlp for your quick fixes!

@Gaiejj
Copy link
Author

Gaiejj commented Jul 26, 2024

Congratulations❤️ ! We have successfully executed full-parameter PPO fine-tuning on Llama 3.1. Thanks again to @ArthurZucker @iamseokhyun and @zucchini-nlp for their super quick effort and follow-up!!!

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@ArthurZucker
Copy link
Collaborator

Closing as completed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants