ModuleNotFoundError: No module named 'transformers_modules' with API serving using phi-2b #3593

haining78zhang · 2024-03-24T09:52:35Z

Your current environment

PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-3.10.0-1062.18.1.el7.x86_64-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2080 Ti
Nvidia driver version: 535.154.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
Stepping: 7
CPU max MHz: 3200.0000
CPU min MHz: 800.0000
BogoMIPS: 4200.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop
_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilitiesVirtualization: VT-x
L1d cache: 1 MiB (32 instances)
L1i cache: 1 MiB (32 instances)
L2 cache: 32 MiB (32 instances)
L3 cache: 44 MiB (2 instances)
NUMA node(s): 2
NUMA node0 CPU(s): 0-15,32-47
NUMA node1 CPU(s): 16-31,48-63
Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; Load fences, usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.1.2
[pip3] triton==2.1.0
[conda] Could not collectROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.3.3
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X 16-31,48-63 1 N/A

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

I tried to deploy an API serving using phi-2b over a ray cluster which runs on 2 docker container instances, but there is an error:

 python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8080 --trust-remote-code --model=/data/models/phi-2b/ --tensor-parallel-size 2

ERROR worker.py:406 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::RayWorkerVllm.init_worker() (pid=2374, ip=10.161.12.10, actor_id=a70ba90bc9c7da25d5d0824301000000, repr=<vllm.engine.ray_utils.RayWorke
rVllm object at 0x7f9243af2230>)  At least one of the input arguments for this task could not be computed:
ray.exceptions.RaySystemError: System error: No module named 'transformers_modules'
traceback: Traceback (most recent call last):
ModuleNotFoundError: No module named 'transformers_modules'
(RayWorkerVllm pid=2374, ip=10.161.12.10) No module named 'transformers_modules'
(RayWorkerVllm pid=2374, ip=10.161.12.10) Traceback (most recent call last):
(RayWorkerVllm pid=2374, ip=10.161.12.10)   File "/usr/local/lib/python3.10/dist-packages/ray/_private/serialization.py", line 404, in deserialize_objects
(RayWorkerVllm pid=2374, ip=10.161.12.10)     obj = self._deserialize_object(data, metadata, object_ref)
(RayWorkerVllm pid=2374, ip=10.161.12.10)   File "/usr/local/lib/python3.10/dist-packages/ray/_private/serialization.py", line 270, in _deserialize_object
(RayWorkerVllm pid=2374, ip=10.161.12.10)     return self._deserialize_msgpack_data(data, metadata_fields)
(RayWorkerVllm pid=2374, ip=10.161.12.10)   File "/usr/local/lib/python3.10/dist-packages/ray/_private/serialization.py", line 225, in _deserialize_msgpack_data
(RayWorkerVllm pid=2374, ip=10.161.12.10)     python_objects = self._deserialize_pickle5_data(pickle5_data)
(RayWorkerVllm pid=2374, ip=10.161.12.10)   File "/usr/local/lib/python3.10/dist-packages/ray/_private/serialization.py", line 215, in _deserialize_pickle5_data
(RayWorkerVllm pid=2374, ip=10.161.12.10)     obj = pickle.loads(in_band)
(RayWorkerVllm pid=2374, ip=10.161.12.10) ModuleNotFoundError: No module named 'transformers_modules'

It looks similar to the issue [572] (#572) however, I still got this blocker.

The text was updated successfully, but these errors were encountered:

youkaichao · 2024-03-24T14:35:23Z

I tried to deploy an API serving using phi-2b over a ray cluster which runs on 2 docker container instances, but there is an error:

Do you mean you use 2 docker instance with 1 GPU for each? You can try to have one docker instance with 2 GPUs. Not sure if it works, but it is worth a try.

haining78zhang · 2024-03-25T00:53:14Z

I tried to deploy an API serving using phi-2b over a ray cluster which runs on 2 docker container instances, but there is an error:

Do you mean you use 2 docker instance with 1 GPU for each? You can try to have one docker instance with 2 GPUs. Not sure if it works, but it is worth a try.

Yes, 2 docker instances with 1 GPU for each, each docker instance runs on a physical node, so I got two physical servers in the same network, each server has one GPU and one docker instance (the image was built on top of the official image vllm/vllm-openai). the ray cluster seems working well, 2 GPUs are shown in the status report. Due to the power limitation, I can't have two GPUs running on the same machine, so hard to try the 2-GPU solution.

youkaichao · 2024-03-25T04:54:30Z

Sorry I'm not familiar with ray cluster. Maybe @simon-mo can help.

haining78zhang · 2024-03-25T09:27:24Z

Sorry I'm not familiar with ray cluster. Maybe @simon-mo can help.

perhase mklf

haining78zhang · 2024-03-26T03:10:59Z

@mklf can Yijia also take a look?

Yang-x-Zhao · 2024-04-17T01:42:17Z

I am facing the same bug for qwen and baichuan model.

I am also on 2 docker instances on 2 nodes (each with 2 gpus). I tried tensor_parallel=2 and tensor_parallel=4. When tensor_parallel=2 (running on 1 node only), qwen and baichuan are working well. When tensor_parallel=3 (running on 2 nodes), qwen and baichuan shows this error.

However, in this environment, I can run llama correctly when tensor_parallel=4 and tensor_parallel=2.

DefTruth · 2024-04-23T03:59:21Z

same error for me

zhenfenxiao · 2024-05-09T07:59:27Z

I encountered same error when I tried to deploy finetuned qwen (local storage) on two nodes.

baughmann · 2024-07-11T04:19:04Z

Having this issue myself with Phi-3-small when using the AsyncLLMEngine directly

haining78zhang added the bug Something isn't working label Mar 24, 2024

haining78zhang closed this as completed Mar 25, 2024

haining78zhang reopened this Mar 25, 2024

DarkLight1337 mentioned this issue Jun 21, 2024

[Model] Initialize Phi-3-vision support #4986

Merged

3 tasks

baughmann mentioned this issue Jul 11, 2024

[Bug]: Phi3 - AsyncLLMEngine - trust_remote_code error #6263

Closed

tjohnson31415 mentioned this issue Jul 24, 2024

[Bugfix]: serialize config instances by value when using --trust-remote-code #6751

Merged

youkaichao closed this as completed in #6751 Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModuleNotFoundError: No module named 'transformers_modules' with API serving using phi-2b #3593

ModuleNotFoundError: No module named 'transformers_modules' with API serving using phi-2b #3593

haining78zhang commented Mar 24, 2024

youkaichao commented Mar 24, 2024

haining78zhang commented Mar 25, 2024 •

edited

Loading

youkaichao commented Mar 25, 2024

haining78zhang commented Mar 25, 2024

haining78zhang commented Mar 26, 2024

Yang-x-Zhao commented Apr 17, 2024 •

edited

Loading

DefTruth commented Apr 23, 2024

zhenfenxiao commented May 9, 2024

baughmann commented Jul 11, 2024

ModuleNotFoundError: No module named 'transformers_modules' with API serving using phi-2b #3593

ModuleNotFoundError: No module named 'transformers_modules' with API serving using phi-2b #3593

Comments

haining78zhang commented Mar 24, 2024

Your current environment

🐛 Describe the bug

youkaichao commented Mar 24, 2024

haining78zhang commented Mar 25, 2024 • edited Loading

youkaichao commented Mar 25, 2024

haining78zhang commented Mar 25, 2024

haining78zhang commented Mar 26, 2024

Yang-x-Zhao commented Apr 17, 2024 • edited Loading

DefTruth commented Apr 23, 2024

zhenfenxiao commented May 9, 2024

baughmann commented Jul 11, 2024

haining78zhang commented Mar 25, 2024 •

edited

Loading

Yang-x-Zhao commented Apr 17, 2024 •

edited

Loading