Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phi3 inference RuntimeError #1391

Open
khmyznikov opened this issue Sep 30, 2024 · 2 comments
Open

phi3 inference RuntimeError #1391

khmyznikov opened this issue Sep 30, 2024 · 2 comments

Comments

@khmyznikov
Copy link

khmyznikov commented Sep 30, 2024

Describe the bug
After some of the changes, phi3 sample with inference flag stopped to work

Olive logs

python phi3.py --target cpu --precision int4 --inference --prompt "Write a story starting with once upon a time" --max_length 200

Generating Olive configuration file...
Olive configuration file is generated...

Generating optimized model for cpu ...

[2024-10-01 00:04:20,337] [INFO] [run.py:138:run_engine] Running workflow default_workflow
[2024-10-01 00:04:20,404] [INFO] [cache.py:51:__init__] Using cache directory: X:\GitHub\Windows-DevRel\Tools\ML-tests\.temp\cache\default_workflow
[2024-10-01 00:04:20,409] [INFO] [engine.py:975:save_olive_config] Saved Olive config to X:\GitHub\Windows-DevRel\Tools\ML-tests\.temp\cache\default_workflow\olive_config.json
[2024-10-01 00:04:20,415] [INFO] [accelerator_creator.py:224:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2024-10-01 00:04:20,418] [INFO] [engine.py:274:run] Running Olive on accelerator: cpu-cpu
[2024-10-01 00:04:20,419] [INFO] [engine.py:1068:_create_system] Creating target system ...
[2024-10-01 00:04:20,419] [INFO] [engine.py:1071:_create_system] Target system created in 0.000000 seconds
[2024-10-01 00:04:20,419] [INFO] [engine.py:1080:_create_system] Creating host system ...
[2024-10-01 00:04:20,419] [INFO] [engine.py:1083:_create_system] Host system created in 0.000000 seconds
[2024-10-01 00:04:20,429] [INFO] [engine.py:840:_run_pass] Running pass builder:ModelBuilder {}
[2024-10-01 00:04:20,431] [INFO] [engine.py:877:_run_pass] Loaded model from cache: 1_ModelBuilder-5180b740-35f3a3bd-cpu-cpu
[2024-10-01 00:04:21,309] [INFO] [engine.py:457:run_no_search] Saved output model to C:\Users\gkhmyznikov\AppData\Local\Temp\tmp8bsbfm3o\output_model
[2024-10-01 00:04:21,310] [INFO] [engine.py:367:run_accelerator] Save footprint to C:\Users\gkhmyznikov\AppData\Local\Temp\tmp8bsbfm3o\footprints.json.
[2024-10-01 00:04:21,313] [INFO] [engine.py:294:run] Run history for cpu-cpu:
[2024-10-01 00:04:21,314] [INFO] [engine.py:552:dump_run_history] Please install tabulate for better run history output
Command succeeded. Output model saved to X:\GitHub\Windows-DevRel\Tools\ML-tests\.temp\models\phi3

Model inference starts...
Loading model...
Model loaded in 2.86 seconds
Creating tokenizer...
Traceback (most recent call last):
  File "X:\GitHub\Windows-DevRel\Tools\ML-tests\.temp\phi3.py", line 342, in <module>
    main()
  File "X:\GitHub\Windows-DevRel\Tools\ML-tests\.temp\phi3.py", line 183, in main
    genai_run(prompts, str(output_path / "model"), max_length)
  File "X:\GitHub\Windows-DevRel\Tools\ML-tests\.temp\phi3.py", line 287, in genai_run
    tokenizer = og.Tokenizer(model)
                ^^^^^^^^^^^^^^^^^^^
RuntimeError: [json.exception.type_error.302] type must be string, but is array

Other information

  • OS: Windows
  • Olive version: main 0.7.0
  • ONNXRuntime package and version: onnxruntime 1.19.2
  • Transformers package version: transformers 4.45.1
@devang-ml
Copy link
Contributor

@shaahji

@shaahji
Copy link
Contributor

shaahji commented Oct 1, 2024

@khmyznikov I am unable to reproduce the issue with current head in Olive repo.

After some of the changes ....

You mentioned something about making changes. Did you mean Olive dev making changes or did you make these changes. If the changes are local to you, could you share what you did?

Here's the output from my local run from tip in Olive repo.

(olive) <local>\Olive\examples\phi3>python phi3.py --target cpu --precision int4 --inference --prompt "Write a story starting with once upon a time" --max_length 200

Generating Olive configuration file...
Olive configuration file is generated...

Generating optimized model for cpu ...

[2024-10-01 10:58:03,026] [INFO] [run.py:138:run_engine] Running workflow default_workflow
[2024-10-01 10:58:03,093] [INFO] [cache.py:51:__init__] Using cache directory: <local>\Olive\examples\phi3\cache\default_workflow
[2024-10-01 10:58:03,093] [INFO] [engine.py:975:save_olive_config] Saved Olive config to <local>\Olive\examples\phi3\cache\default_workflow\olive_config.json
[2024-10-01 10:58:03,109] [INFO] [accelerator_creator.py:224:create_accelerators] Running workflow on accelerator specs: cpu-cpu
[2024-10-01 10:58:03,109] [INFO] [engine.py:274:run] Running Olive on accelerator: cpu-cpu
[2024-10-01 10:58:03,109] [INFO] [engine.py:1068:_create_system] Creating target system ...
[2024-10-01 10:58:03,109] [INFO] [engine.py:1071:_create_system] Target system created in 0.000000 seconds
[2024-10-01 10:58:03,109] [INFO] [engine.py:1080:_create_system] Creating host system ...
[2024-10-01 10:58:03,109] [INFO] [engine.py:1083:_create_system] Host system created in 0.000000 seconds
[2024-10-01 10:58:03,141] [INFO] [engine.py:840:_run_pass] Running pass builder:ModelBuilder {}
<local>\.conda\envs\olive\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
<local>\.conda\envs\olive\lib\site-packages\transformers\models\auto\configuration_auto.py:913: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
  warnings.warn(
<local>\.conda\envs\olive\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
GroupQueryAttention (GQA) is used in this model.
<local>\.conda\envs\olive\lib\site-packages\transformers\models\auto\auto_factory.py:468: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
  warnings.warn(
modeling_phi3.py: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 73.2k/73.2k [00:00<00:00, 4.81MB/s]
A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
2024-10-01 10:58:09,512 transformers_modules.microsoft.Phi-3-mini-4k-instruct.0a67737cc96d2554230f90338b163bc6380a2a85.modeling_phi3 [WARNING] - `flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
2024-10-01 10:58:09,512 transformers_modules.microsoft.Phi-3-mini-4k-instruct.0a67737cc96d2554230f90338b163bc6380a2a85.modeling_phi3 [WARNING] - Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
<local>\.conda\envs\olive\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
model.safetensors.index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16.5k/16.5k [00:00<00:00, 2.10MB/s]
model-00001-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.97G/4.97G [14:55<00:00, 5.55MB/s]
model-00002-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.67G/2.67G [07:18<00:00, 6.09MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [22:14<00:00, 667.13s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.04s/it]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 181/181 [00:00<?, ?B/s]
Reading embedding layer
Reading decoder layer 0
Reading decoder layer 1
Reading decoder layer 2
Reading decoder layer 3
Reading decoder layer 4
Reading decoder layer 5
Reading decoder layer 6
Reading decoder layer 7
Reading decoder layer 8
Reading decoder layer 9
Reading decoder layer 10
Reading decoder layer 11
Reading decoder layer 12
Reading decoder layer 13
Reading decoder layer 14
Reading decoder layer 15
Reading decoder layer 16
Reading decoder layer 17
Reading decoder layer 18
Reading decoder layer 19
Reading decoder layer 20
Reading decoder layer 21
Reading decoder layer 22
Reading decoder layer 23
Reading decoder layer 24
Reading decoder layer 25
Reading decoder layer 26
Reading decoder layer 27
Reading decoder layer 28
Reading decoder layer 29
Reading decoder layer 30
Reading decoder layer 31
Reading final norm
Reading LM head
Saving ONNX model in <local>\Olive\examples\phi3\cache\default_workflow\models\1_ModelBuilder-3874e362-35f3a3bd-cpu-cpu\output_model
<local>\.conda\envs\olive\lib\site-packages\transformers\generation\configuration_utils.py:814: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
  warnings.warn(
Saving GenAI config in <local>\Olive\examples\phi3\cache\default_workflow\models\1_ModelBuilder-3874e362-35f3a3bd-cpu-cpu\output_model
<local>\.conda\envs\olive\lib\site-packages\transformers\models\auto\tokenization_auto.py:757: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
  warnings.warn(
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.44k/3.44k [00:00<?, ?B/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 6.28MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.94M/1.94M [00:00<00:00, 6.22MB/s]
added_tokens.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 306/306 [00:00<00:00, 19.1kB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 599/599 [00:00<00:00, 92.1kB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Saving processing files in <local>\Olive\examples\phi3\cache\default_workflow\models\1_ModelBuilder-3874e362-35f3a3bd-cpu-cpu\output_model for GenAI
[2024-10-01 11:23:23,189] [INFO] [engine.py:943:_run_pass] Pass builder:ModelBuilder finished in 1520.048102 seconds
[2024-10-01 11:23:25,341] [INFO] [engine.py:457:run_no_search] Saved output model to <local>\AppData\Local\Temp\tmpft22faik\output_model
[2024-10-01 11:23:25,346] [INFO] [engine.py:367:run_accelerator] Save footprint to <local>\AppData\Local\Temp\tmpft22faik\footprints.json.
[2024-10-01 11:23:25,350] [INFO] [engine.py:294:run] Run history for cpu-cpu:
[2024-10-01 11:23:25,362] [INFO] [engine.py:550:dump_run_history] run history:
+------------------------------------------+-------------------+--------------+----------------+-----------+
| model_id                                 | parent_model_id   | from_pass    |   duration_sec | metrics   |
+==========================================+===================+==============+================+===========+
| 3874e362                                 |                   |              |                |           |
+------------------------------------------+-------------------+--------------+----------------+-----------+
| 1_ModelBuilder-3874e362-35f3a3bd-cpu-cpu | 3874e362          | ModelBuilder |        1520.05 |           |
+------------------------------------------+-------------------+--------------+----------------+-----------+
Command succeeded. Output model saved to models\phi3

Model inference starts...
Loading model...
Model loaded in 3.16 seconds
Creating tokenizer...
Creating generator ...
Generator created

 <|user|>
Write a story starting with once upon a time<|end|>
<|assistant|>
 Once upon a time, in a small village nestled between rolling hills and lush green meadows, there lived a young girl named Lily. She was known for her kind heart and her insatiable curiosity about the world around her. Lily'ran as the sun rose, she would eagerly set out on her daily adventures, exploring the woods, fields, and streams that surrounded her village.

One day, as she was wandering through the woods, Lily stumbled upon an old, moss-covered stone with strange symbols etched into its surface. Intrigued, she decided to take the stone home and study it. As she examined the symbols, she noticed that they seemed to form a pattern, like a map.

Determined to uncover the mystery of the stone, Lily spent days and nights studying the symbols, drawing

Prompt tokens: 16, New tokens: 184, Time to first: 0.40s, New tokens per second: 15.01 tps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants