Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phi3 conversion OOM on A100 #44

Open
a8nova opened this issue Jun 10, 2024 · 12 comments
Open

Phi3 conversion OOM on A100 #44

a8nova opened this issue Jun 10, 2024 · 12 comments
Assignees

Comments

@a8nova
Copy link

a8nova commented Jun 10, 2024

Description of the bug:

I wanted to convert phi3, I made the necessary changes in my own fork main...a8nova:ai-edge-torch:phi3 but OOM killer is nuking my process

Full error attached:
phi3_conversion_error.txt

Actual vs expected behavior:

The OOM is nuking the conversion script on a Colab A100.

Any other information you'd like to share?

  1. Is there anything wrong in the phi3 re-authoring? All changes can be viewed here: main...a8nova:ai-edge-torch:phi3
  2. Is there anything I can do to get it to convert? (e.g. changing parameters to make it memory efficient..)
  3. Debugging tips?
@a8nova a8nova changed the title Phi3 conversion fails on A100 Phi3 conversion OOM on A100 Jun 10, 2024
@haozha111
Copy link
Contributor

Hi @a8nova thanks for reporting the issue!

There is a known issue for high memory usage during the conversion process, which may kill the conversion script. Which phi-3 version are you converting? What's the size of phi-3 checkpoint you are using? A colab free instance may only have 12GB free RAM, which isn't enough. Do you happen to have:

  1. A colab pro subscription
  2. A Linux workstation (or on cloud) which has over 50GB of memory?

We are still actively working on fixing the memory issue, and sorry for the inconvenience!

@haozha111 haozha111 self-assigned this Jun 10, 2024
@haozha111
Copy link
Contributor

Also from the conversion log, it seems the memory consumption is from CUDA. Are you able to try CUDA_VISIBLE_DEVICES=-1 to disable GPU memory allocation? The conversion only needs to consume CPU memory.

@a8nova
Copy link
Author

a8nova commented Jun 10, 2024

Hi @haozha111 - Thank you for the quick response.

Let me try setting CUDA_VISIBLE_DEVICES

@a8nova
Copy link
Author

a8nova commented Jun 10, 2024

I am also getting OOM when running with CUDA_VISIBLE_DEVICES=-1 on a box with 53GB system RAM

env: CUDA_VISIBLE_DEVICES=-1
/content/ai-edge-torch/ai_edge_torch/generative/examples/phi3
2024-06-10 20:14:26.133314: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-10 20:14:26.577974: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-10 20:14:28.834951: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = os.fork()
WARNING:root:PJRT is now the default runtime. For more information, see https://github.com/pytorch/xla/blob/master/docs/pjrt.md
WARNING:root:Defaulting to PJRT_DEVICE=CPU
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1718050475.549140   17434 cpu_client.cc:424] TfrtCpuClient created.
WARNING:root:Your model "prefill" is converted in training mode. Please set the module in evaluation mode with `module.eval()` for better on-device performance and compatibility.
WARNING:root:Your model "decode" is converted in training mode. Please set the module in evaluation mode with `module.eval()` for better on-device performance and compatibility.
2024-06-10 20:18:36.252751: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2024-06-10 20:18:36.252876: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:135] retrieving CUDA diagnostic information for host: 055cf236c060
2024-06-10 20:18:36.252891: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:142] hostname: 055cf236c060
2024-06-10 20:18:36.253133: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:166] libcuda reported version is: 535.104.5
2024-06-10 20:18:36.253165: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:170] kernel reported version is: 535.104.5
2024-06-10 20:18:36.253176: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:249] kernel version seems to match DSO: 535.104.5
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1718051375.007065   17434 tf_tfl_flatbuffer_helpers.cc:392] Ignored output_format.
W0000 00:00:1718051375.010046   17434 tf_tfl_flatbuffer_helpers.cc:395] Ignored drop_control_dependency.
2024-06-10 20:29:35.016643: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /tmp/tmpil1idwz1
2024-06-10 20:29:35.028233: I tensorflow/cc/saved_model/reader.cc:52] Reading meta graph with tags { serve }
2024-06-10 20:29:35.028277: I tensorflow/cc/saved_model/reader.cc:147] Reading SavedModel debug info (if present) from: /tmp/tmpil1idwz1
2024-06-10 20:29:35.126021: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-06-10 20:29:35.139828: I tensorflow/cc/saved_model/loader.cc:236] Restoring SavedModel bundle.
^C
[ ]
Colab paid products - Cancel contracts here
You are subscribed to Colab Pro. Learn more
Available: 25.31 compute units
Usage rate: approximately 4.82 per hour
You have 1 active session.
Python 3 Google Compute Engine backend (GPU)
Showing resources from 10:20 PM to 11:31 PM
System RAM
1.5 / 53.0 GB
 
GPU RAM
0.0 / 22.5 GB
 
Disk
68.1 / 201.2 GB

@haozha111
Copy link
Contributor

got it, do you mind to update your branch w/ phi-3, and we can fork and try converting it. thanks!

@a8nova
Copy link
Author

a8nova commented Jun 11, 2024

Changes in the phi3 branch are up to date, you should be able to checkout and run the conversion script. Note that I also had to make changes to loader.py and feed_forward.py. Please let me know if you run into any issues. Thank you!

@haozha111 haozha111 assigned vamsimanchala and unassigned haozha111 Jun 11, 2024
@a8nova
Copy link
Author

a8nova commented Jun 18, 2024

Hi @haozha111 @vamsimanchala - Any updates on this? Thanks!

@haozha111
Copy link
Contributor

hi @a8nova , we are making good progress on this issue, and it requires some fixes in our converter stack. We plan to give an update on this issue soon in the coming weeks, thanks for your patience!

@mitsunami
Copy link

Hi, I am also encountering the same issue. Although I cannot share the model details, it appears to be getting killed at the same point as seen in the logs above. I am looking forward to a fix for this issue. Thanks!

@haozha111
Copy link
Contributor

hi @mitsunami ,

Are you trying to convert from colab pro instance, or a local Linux workstation, and how much memory do you have?

We are making great progress on reducing the converter memory issue and we will give an update on this issue soon, thanks for your patience!

@mitsunami
Copy link

Hi @haozha111,
I'm trying that on a local desktop with 64 GB RAM. Looking forward to an update. Thanks!

@vamsimanchala
Copy link
Contributor

Hi @mitsunami, We recently landed some changes. Can you please exercise the conversion to TFLite and let us know if things look good.

Thank you for your patience,
Vamsi Manchala

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants