phi3 inference RuntimeError #1391

khmyznikov · 2024-09-30T22:17:56Z

Describe the bug
After some of the changes, phi3 sample with inference flag stopped to work

Olive logs

python phi3.py --target cpu --precision int4 --inference --prompt "Write a story starting with once upon a time" --max_length 200

Generating Olive configuration file...
Olive configuration file is generated...

Generating optimized model for cpu ...

[2024-10-01 00:04:20,337] [INFO] [run.py:138:run_engine] Running workflow default_workflow
[2024-10-01 00:04:20,404] [INFO] [cache.py:51:__init__] Using cache directory: X:\GitHub\Windows-DevRel\Tools\ML-tests\.temp\cache\default_workflow
[2024-10-01 00:04:20,409] [INFO] [engine.py:975:save_olive_config] Saved Olive config to X:\GitHub\Windows-DevRel\Tools\ML-tests\.temp\cache\default_workflow\olive_config.json
[2024-10-01 00:04:20,415] [INFO] [accelerator_creator.py:224:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2024-10-01 00:04:20,418] [INFO] [engine.py:274:run] Running Olive on accelerator: cpu-cpu
[2024-10-01 00:04:20,419] [INFO] [engine.py:1068:_create_system] Creating target system ...
[2024-10-01 00:04:20,419] [INFO] [engine.py:1071:_create_system] Target system created in 0.000000 seconds
[2024-10-01 00:04:20,419] [INFO] [engine.py:1080:_create_system] Creating host system ...
[2024-10-01 00:04:20,419] [INFO] [engine.py:1083:_create_system] Host system created in 0.000000 seconds
[2024-10-01 00:04:20,429] [INFO] [engine.py:840:_run_pass] Running pass builder:ModelBuilder {}
[2024-10-01 00:04:20,431] [INFO] [engine.py:877:_run_pass] Loaded model from cache: 1_ModelBuilder-5180b740-35f3a3bd-cpu-cpu
[2024-10-01 00:04:21,309] [INFO] [engine.py:457:run_no_search] Saved output model to C:\Users\gkhmyznikov\AppData\Local\Temp\tmp8bsbfm3o\output_model
[2024-10-01 00:04:21,310] [INFO] [engine.py:367:run_accelerator] Save footprint to C:\Users\gkhmyznikov\AppData\Local\Temp\tmp8bsbfm3o\footprints.json.
[2024-10-01 00:04:21,313] [INFO] [engine.py:294:run] Run history for cpu-cpu:
[2024-10-01 00:04:21,314] [INFO] [engine.py:552:dump_run_history] Please install tabulate for better run history output
Command succeeded. Output model saved to X:\GitHub\Windows-DevRel\Tools\ML-tests\.temp\models\phi3

Model inference starts...
Loading model...
Model loaded in 2.86 seconds
Creating tokenizer...
Traceback (most recent call last):
  File "X:\GitHub\Windows-DevRel\Tools\ML-tests\.temp\phi3.py", line 342, in <module>
    main()
  File "X:\GitHub\Windows-DevRel\Tools\ML-tests\.temp\phi3.py", line 183, in main
    genai_run(prompts, str(output_path / "model"), max_length)
  File "X:\GitHub\Windows-DevRel\Tools\ML-tests\.temp\phi3.py", line 287, in genai_run
    tokenizer = og.Tokenizer(model)
                ^^^^^^^^^^^^^^^^^^^
RuntimeError: [json.exception.type_error.302] type must be string, but is array

Other information

OS: Windows
Olive version: main 0.7.0
ONNXRuntime package and version: onnxruntime 1.19.2
Transformers package version: transformers 4.45.1

The text was updated successfully, but these errors were encountered:

devang-ml · 2024-10-01T16:58:46Z

@shaahji

shaahji · 2024-10-01T18:28:06Z

@khmyznikov I am unable to reproduce the issue with current head in Olive repo.

After some of the changes ....

You mentioned something about making changes. Did you mean Olive dev making changes or did you make these changes. If the changes are local to you, could you share what you did?

Here's the output from my local run from tip in Olive repo.

(olive) <local>\Olive\examples\phi3>python phi3.py --target cpu --precision int4 --inference --prompt "Write a story starting with once upon a time" --max_length 200

Generating Olive configuration file...
Olive configuration file is generated...

Generating optimized model for cpu ...

[2024-10-01 10:58:03,026] [INFO] [run.py:138:run_engine] Running workflow default_workflow
[2024-10-01 10:58:03,093] [INFO] [cache.py:51:__init__] Using cache directory: <local>\Olive\examples\phi3\cache\default_workflow
[2024-10-01 10:58:03,093] [INFO] [engine.py:975:save_olive_config] Saved Olive config to <local>\Olive\examples\phi3\cache\default_workflow\olive_config.json
[2024-10-01 10:58:03,109] [INFO] [accelerator_creator.py:224:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2024-10-01 10:58:03,109] [INFO] [engine.py:274:run] Running Olive on accelerator: cpu-cpu
[2024-10-01 10:58:03,109] [INFO] [engine.py:1068:_create_system] Creating target system ...
[2024-10-01 10:58:03,109] [INFO] [engine.py:1071:_create_system] Target system created in 0.000000 seconds
[2024-10-01 10:58:03,109] [INFO] [engine.py:1080:_create_system] Creating host system ...
[2024-10-01 10:58:03,109] [INFO] [engine.py:1083:_create_system] Host system created in 0.000000 seconds
[2024-10-01 10:58:03,141] [INFO] [engine.py:840:_run_pass] Running pass builder:ModelBuilder {}
<local>\.conda\envs\olive\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
<local>\.conda\envs\olive\lib\site-packages\transformers\models\auto\configuration_auto.py:913: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
  warnings.warn(
<local>\.conda\envs\olive\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
GroupQueryAttention (GQA) is used in this model.
<local>\.conda\envs\olive\lib\site-packages\transformers\models\auto\auto_factory.py:468: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
  warnings.warn(
modeling_phi3.py: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 73.2k/73.2k [00:00<00:00, 4.81MB/s]
A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
2024-10-01 10:58:09,512 transformers_modules.microsoft.Phi-3-mini-4k-instruct.0a67737cc96d2554230f90338b163bc6380a2a85.modeling_phi3 [WARNING] - `flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
2024-10-01 10:58:09,512 transformers_modules.microsoft.Phi-3-mini-4k-instruct.0a67737cc96d2554230f90338b163bc6380a2a85.modeling_phi3 [WARNING] - Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
<local>\.conda\envs\olive\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
model.safetensors.index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16.5k/16.5k [00:00<00:00, 2.10MB/s]
model-00001-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.97G/4.97G [14:55<00:00, 5.55MB/s]
model-00002-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.67G/2.67G [07:18<00:00, 6.09MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [22:14<00:00, 667.13s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.04s/it]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 181/181 [00:00<?, ?B/s]
Reading embedding layer
Reading decoder layer 0
Reading decoder layer 1
Reading decoder layer 2
Reading decoder layer 3
Reading decoder layer 4
Reading decoder layer 5
Reading decoder layer 6
Reading decoder layer 7
Reading decoder layer 8
Reading decoder layer 9
Reading decoder layer 10
Reading decoder layer 11
Reading decoder layer 12
Reading decoder layer 13
Reading decoder layer 14
Reading decoder layer 15
Reading decoder layer 16
Reading decoder layer 17
Reading decoder layer 18
Reading decoder layer 19
Reading decoder layer 20
Reading decoder layer 21
Reading decoder layer 22
Reading decoder layer 23
Reading decoder layer 24
Reading decoder layer 25
Reading decoder layer 26
Reading decoder layer 27
Reading decoder layer 28
Reading decoder layer 29
Reading decoder layer 30
Reading decoder layer 31
Reading final norm
Reading LM head
Saving ONNX model in <local>\Olive\examples\phi3\cache\default_workflow\models\1_ModelBuilder-3874e362-35f3a3bd-cpu-cpu\output_model
<local>\.conda\envs\olive\lib\site-packages\transformers\generation\configuration_utils.py:814: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
  warnings.warn(
Saving GenAI config in <local>\Olive\examples\phi3\cache\default_workflow\models\1_ModelBuilder-3874e362-35f3a3bd-cpu-cpu\output_model
<local>\.conda\envs\olive\lib\site-packages\transformers\models\auto\tokenization_auto.py:757: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
  warnings.warn(
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.44k/3.44k [00:00<?, ?B/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 6.28MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.94M/1.94M [00:00<00:00, 6.22MB/s]
added_tokens.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 306/306 [00:00<00:00, 19.1kB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 599/599 [00:00<00:00, 92.1kB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Saving processing files in <local>\Olive\examples\phi3\cache\default_workflow\models\1_ModelBuilder-3874e362-35f3a3bd-cpu-cpu\output_model for GenAI
[2024-10-01 11:23:23,189] [INFO] [engine.py:943:_run_pass] Pass builder:ModelBuilder finished in 1520.048102 seconds
[2024-10-01 11:23:25,341] [INFO] [engine.py:457:run_no_search] Saved output model to <local>\AppData\Local\Temp\tmpft22faik\output_model
[2024-10-01 11:23:25,346] [INFO] [engine.py:367:run_accelerator] Save footprint to <local>\AppData\Local\Temp\tmpft22faik\footprints.json.
[2024-10-01 11:23:25,350] [INFO] [engine.py:294:run] Run history for cpu-cpu:
[2024-10-01 11:23:25,362] [INFO] [engine.py:550:dump_run_history] run history:
+------------------------------------------+-------------------+--------------+----------------+-----------+
| model_id                                 | parent_model_id   | from_pass    |   duration_sec | metrics   |
+==========================================+===================+==============+================+===========+
| 3874e362                                 |                   |              |                |           |
+------------------------------------------+-------------------+--------------+----------------+-----------+
| 1_ModelBuilder-3874e362-35f3a3bd-cpu-cpu | 3874e362          | ModelBuilder |        1520.05 |           |
+------------------------------------------+-------------------+--------------+----------------+-----------+
Command succeeded. Output model saved to models\phi3

Model inference starts...
Loading model...
Model loaded in 3.16 seconds
Creating tokenizer...
Creating generator ...
Generator created

 <|user|>
Write a story starting with once upon a time<|end|>
<|assistant|>
 Once upon a time, in a small village nestled between rolling hills and lush green meadows, there lived a young girl named Lily. She was known for her kind heart and her insatiable curiosity about the world around her. Lily'ran as the sun rose, she would eagerly set out on her daily adventures, exploring the woods, fields, and streams that surrounded her village.

One day, as she was wandering through the woods, Lily stumbled upon an old, moss-covered stone with strange symbols etched into its surface. Intrigued, she decided to take the stone home and study it. As she examined the symbols, she noticed that they seemed to form a pattern, like a map.

Determined to uncover the mystery of the stone, Lily spent days and nights studying the symbols, drawing

Prompt tokens: 16, New tokens: 184, Time to first: 0.40s, New tokens per second: 15.01 tps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phi3 inference RuntimeError #1391

phi3 inference RuntimeError #1391

khmyznikov commented Sep 30, 2024 •

edited

Loading

devang-ml commented Oct 1, 2024

shaahji commented Oct 1, 2024

phi3 inference RuntimeError #1391

phi3 inference RuntimeError #1391

Comments

khmyznikov commented Sep 30, 2024 • edited Loading

devang-ml commented Oct 1, 2024

shaahji commented Oct 1, 2024

khmyznikov commented Sep 30, 2024 •

edited

Loading