-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix]: serialize config instances by value when using --trust-remote-code #6751
base: main
Are you sure you want to change the base?
[Bugfix]: serialize config instances by value when using --trust-remote-code #6751
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
Great observation! In general we should not expect any new code when we use multi-node serving. HF dynamically downloaded code and module is a pain in this case. I think converting the object to a dict makes sense to me. Does it have any side effect? Or we can just do it in all the cases? |
Yeah, this is something that we'll need to look at further. If the custom config class only adds new attributes and default values (eg. no new methods used by the modeling code), then this this conversion to a |
Will this be merged in the next release? It'd be great to have engine Ray support for Phi3 and others with remote code. |
Signed-off-by: Travis Johnson <[email protected]>
…module Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
e25a0f0
to
602c0d3
Compare
In #6607 (comment), the I'm also still working out how to test that the conversion to |
Signed-off-by: Travis Johnson <[email protected]>
In my testing, I found that most attributes of the custom config could be attached to the PretrainedConfig, but that some configurations are expected to be class attributes and those would not be preserved (eg. I did some more investigation into the serialization and found a much better solution: the This means that the |
@youkaichao @rkooo567 This PR is now ready for review. Please take a look :) |
do you happen to know how does the |
Not in any detail, but I assume that it somehow serializes the class definition along with the instance data in what it communicates between the workers. |
# See: https://github.com/cloudpipe/cloudpickle?tab=readme-ov-file#overriding-pickles-serialization-mechanism-for-importable-constructs | ||
try: | ||
import transformers_modules | ||
ray.cloudpickle.register_pickle_by_value(transformers_modules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use import cloudpickle
rather than ray.cloudpickle
It is not currently possible to run vLLM with a model that requires
--trust-remote-code
if the server spans multiple nodes. The server will crash with an error when it attempts to communicate the dynamically generated configuration class to a remote Ray worker. The crux of the error isThis error arises due to the dynamic
transformers_modules
module generated by Transformers when it loads the remote code configuration for the model. In a multi-node context, this module is generated on the head node whentransformers.AutoConfig.from_pretrained()
is called, but it isn't generated on the other nodes.This is a very similar issue to what was resolved in #871 and #4285, but now in a multi-node context. As noted in #871, the failure to import occurs when Ray attempts to communicate the
ModelConfig
object containinghf_config
andhf_text_config
referencing the dynamically imported config class fromtransformers_modules
to the worker node. The fix in #871 became the util functioninit_cached_hf_modules
that runstransformers.dynamic_module_utils
on each worker during the initialization of the WorkerWrapperBase. This generates the dynamic module base in~/.cache/huggingface/modules
(which does need to happen once on each node) and also modifies the module search path to include the cache directory (which needs to happen in every worker), but it does not generatetransformers_modules
. Use ofinit_cached_hf_modules
fixed the single node case due to the modification to the module import path, but doesn't fix the multi-node case.A work around would be to run vllm or
transformers.AutoConfig.from_pretrained
on each node manually to generate the modules (or get the generated module files onto each node some other way).The implementation proposed in this PR is to utilize a feature in the
cloudpickle
library that allows the config objects to be serialized by-value instead of by-reference so that the custom config class does not need to be importable in the remote workers.See https://github.com/cloudpipe/cloudpickle?tab=readme-ov-file#overriding-pickles-serialization-mechanism-for-importable-constructs
Doing this also obviates the need for
init_cached_hf_modules()
.A similar error is reported in #6607, even without multiple GPUs. In that case, the failure occurs when using
--trust-remote-code
with--engine-use-ray
. The fix proposed here resolves this issue as well.Alternatives Considered for multi-node (do not fix to the
--engine-use-ray
case):AutoConfig.from_pretrained()
on each worker to usetransformers
to generate the dynamic moduleFIX #3593
FIX #4169
FIX #6263
FIX #6607
FIX #8553
Also fixes the issue raised in #4986 (comment)