Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in offline mode with trust_remote code: SFR-Embedding-Mistral and nomic does not work without einops #185

Closed
3 tasks
prasannakrish97 opened this issue Mar 30, 2024 · 4 comments · Fixed by #233

Comments

@prasannakrish97
Copy link

Model description

You have mentioned that sfr-embedding model is supported along with all other huggingface embedding models (ref.nomic).
However, both are not working :
infinity | ERROR 2024-03-21 14:35:59,554 infinity_emb ERROR: acceleration.py:21
infinity | BetterTransformer is not available for model. The
infinity | model type mistral is not yet supported to be used
infinity | with BetterTransformer. Feel free to open an issue
infinity | at https://github.com/huggingface/optimum/issues if
infinity | you would like this model type to be supported.
infinity | Currently supported models are: dict_keys(['albert',
infinity | 'bark', 'bart', 'bert', 'bert-generation',
infinity | 'blenderbot', 'bloom', 'camembert', 'blip-2',
infinity | 'clip', 'codegen', 'data2vec-text', 'deit',
infinity | 'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2',
infinity | 'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm',
infinity | 'm2m_100', 'marian', 'markuplm', 'mbart', 'opt',
infinity | 'pegasus', 'rembert', 'prophetnet', 'roberta',
infinity | 'roc_bert', 'roformer', 'splinter', 'tapas', 't5',
infinity | 'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2',
infinity | 'xlm-roberta', 'yolos']).. Continue without
infinity | bettertransformer modeling code.
infinity | Traceback (most recent call last):
infinity | File
infinity | "/app/infinity_emb/transformer/acceleration.py",
infinity | line 19, in to_bettertransformer
infinity | model = BetterTransformer.transform(model)
infinity | File "/usr/lib/python3.10/contextlib.py", line 79,
infinity | in inner
infinity | return func(*args, **kwds)
infinity | File
infinity | "/app/.venv/lib/python3.10/site-packages/optimum/bet
infinity | tertransformer/transformation.py", line 234, in
infinity | transform
infinity | raise NotImplementedError(
infinity | NotImplementedError: The model type mistral is not
infinity | yet supported to be used with BetterTransformer.

Open source status

  • The model implementation is available on transformers
  • The model weights are available on huggingface-hub
  • I verified that the model is currently not running in infinity

Provide useful links for the implementation

No response

@michaelfeil
Copy link
Owner

Thanks for opening the issue. Did you really try to get nomic running?

I would not be concerned about the stacktrace of

 infinity | NotImplementedError: The model type mistral is not

Its just a info warning, that says that the optimum package already uses a better attention implementation for mistral, and no better one is available.

nomic

python3 -m venv venv
source ./venv/bin/activate
pip install infinity_emb[all]
pip install einops # einops is a package required just by the custom code of nomic.
infinity_emb --model-name-or-path nomic-ai/nomic-embed-text-v1.5
(.venv) (base) michael@michael-laptop:~/infinity/libs/infinity_emb$ infinity_emb --model-name-or-path nomic-ai/nomic-embed-text-v1.5
INFO:     Started server process [426215]
INFO:     Waiting for application startup.
INFO     2024-03-30 09:31:45,673 infinity_emb INFO: model=`nomic-ai/nomic-embed-text-v1.5` selected, using engine=`torch` and device=`None`        select_model.py:54
INFO     2024-03-30 09:31:46,118 sentence_transformers.SentenceTransformer INFO: Load pretrained SentenceTransformer:                      SentenceTransformer.py:107
         nomic-ai/nomic-embed-text-v1.5                                                                                                                              
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 547M/547M [00:25<00:00, 21.9MB/s]
WARNING  2024-03-30 09:32:14,036                                                                                                        modeling_hf_nomic_bert.py:357
         transformers_modules.nomic-ai.nomic-embed-text-v1-unsupervised.3916676c856f1e25a4cc7a4e0ac740ea6ca9723a.modeling_hf_nomic_bert                              
         WARNING: <All keys matched successfully>                                                                                                                    
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 1.19k/1.19k [00:00<00:00, 8.19MB/s]
vocab.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 1.87MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 711k/711k [00:00<00:00, 2.94MB/s]
special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 695/695 [00:00<00:00, 5.14MB/s]
1_Pooling/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 286/286 [00:00<00:00, 1.99MB/s]
INFO     2024-03-30 09:32:16,061 sentence_transformers.SentenceTransformer INFO: Use pytorch device_name: cuda                             SentenceTransformer.py:213
INFO     2024-03-30 09:32:16,502 infinity_emb INFO: Adding optimizations via Huggingface optimum.                                                  acceleration.py:17
ERROR    2024-03-30 09:32:16,503 infinity_emb ERROR: BetterTransformer is not available for model. The model type nomic_bert is not yet supported  acceleration.py:21
         to be used with BetterTransformer. Feel free to open an issue at https://github.com/huggingface/optimum/issues if you would like this                       
         model type to be supported. Currently supported models are: dict_keys(['albert', 'bark', 'bart', 'bert', 'bert-generation', 'blenderbot',                   
         'bloom', 'camembert', 'blip-2', 'clip', 'codegen', 'data2vec-text', 'deit', 'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2', 'gptj',                       
         'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm', 'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', 'pegasus', 'rembert', 'prophetnet',                           
         'roberta', 'roc_bert', 'roformer', 'splinter', 'tapas', 't5', 'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2', 'xlm-roberta', 'yolos'])..                   
         Continue without bettertransformer modeling code.                                                                                                           
         Traceback (most recent call last):                                                                                                                          
           File "/home/michael/infinity/libs/infinity_emb/infinity_emb/transformer/acceleration.py", line 19, in to_bettertransformer                                
             model = BetterTransformer.transform(model)                                                                                                              
           File "/usr/lib/python3.10/contextlib.py", line 79, in inner                                                                                               
             return func(*args, **kwds)                                                                                                                              
           File "/home/michael/infinity/libs/infinity_emb/.venv/lib/python3.10/site-packages/optimum/bettertransformer/transformation.py", line                      
         234, in transform                                                                                                                                           
             raise NotImplementedError(                                                                                                                              
         NotImplementedError: The model type nomic_bert is not yet supported to be used with BetterTransformer. Feel free to open an issue at                        
         https://github.com/huggingface/optimum/issues if you would like this model type to be supported. Currently supported models are:                            
         dict_keys(['albert', 'bark', 'bart', 'bert', 'bert-generation', 'blenderbot', 'bloom', 'camembert', 'blip-2', 'clip', 'codegen',                            
         'data2vec-text', 'deit', 'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2', 'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm',                             
         'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', 'pegasus', 'rembert', 'prophetnet', 'roberta', 'roc_bert', 'roformer', 'splinter',                         
         'tapas', 't5', 'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2', 'xlm-roberta', 'yolos']).                                                                   
INFO     2024-03-30 09:32:16,510 infinity_emb INFO: Switching to half() precision (cuda: fp16).                                            sentence_transformer.py:73
INFO     2024-03-30 09:32:17,047 infinity_emb INFO: Getting timings for batch_size=32 and avg tokens per sentence=1                                select_model.py:77
                 5.65     ms tokenization                                                                                                                            
                 13.25    ms inference                                                                                                                               
                 0.26     ms post-processing                                                                                                                         
                 19.16    ms total                                                                                                                                   
         embeddings/sec: 1670.14                                                                                                                                     
INFO     2024-03-30 09:32:18,570 infinity_emb INFO: Getting timings for batch_size=32 and avg tokens per sentence=512                              select_model.py:83
                 14.14    ms tokenization                                                                                                                            
                 13.47    ms inference                                                                                                                               
                 726.95   ms post-processing                                                                                                                         
                 754.57   ms total                                                                                                                                   
         embeddings/sec: 42.41                                                                                                                                       
INFO     2024-03-30 09:32:18,572 infinity_emb INFO: model warmed up, between 42.41-1670.14 embeddings/sec at batch_size=32                         select_model.py:84
INFO     2024-03-30 09:32:18,574 infinity_emb INFO: creating batching engine                                                                     batch_handler.py:392
INFO     2024-03-30 09:32:18,575 infinity_emb INFO: ready to batch requests.                                                                     batch_handler.py:249
INFO     2024-03-30 09:32:18,577 infinity_emb INFO:                                                                                             infinity_server.py:64
                                                                                                                                                                     
         ♾️  Infinity - Embedding Inference Server                                                                                                                    
         MIT License; Copyright (c) 2023 Michael Feil                                                                                                                
         Version 0.0.31                                                                                                                                              
                                                                                                                                                                     
         Open the Docs via Swagger UI:                                                                                                                               
         http://0.0.0.0:7997/docs                                                                                                                                    
                                                                                                                                                                     
         Access model via 'GET':                                                                                                                                     
         curl http://0.0.0.0:7997/models                                                                                                                             
                                                                                                                                                                     
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7997 (Press CTRL+C to quit)

Mistral

@prasannakrish97 Can you try running the above commands and post it here?

@prasannakrish97
Copy link
Author

prasannakrish97 commented Apr 5, 2024

Hello
We are using the docker image 0.0.31.
We install our models (nomic-embed-text-v1, and nomic-embed-text-v1.5) locally (/data), no internet access.
einops is 0.7.0. We got an error after the aforementioned warning : (same error for both)
For SFR-Embedding-Mistral, it’s working as intended, past the waning to ignore.

However, we're encountering the following problem for nomic (Nota Bene : Would like to mention that the same model nomic works well with Text Embedding Inference locally but not with infinity ) :

infinity-nomic_1  | INFO:     Started server process [1]
infinity-nomic_1  | INFO:     Waiting for application startup.
infinity-nomic_1  | INFO     2024-04-05 08:52:36,666 infinity_emb INFO:           select_model.py:54
infinity-nomic_1  |          model=`/data` selected, using engine=`torch` and
infinity-nomic_1  |          device=`None`
infinity-nomic_1  | INFO     2024-04-05 08:52:36,678                      SentenceTransformer.py:107
infinity-nomic_1  |          sentence_transformers.SentenceTransformer
infinity-nomic_1  |          INFO: Load pretrained SentenceTransformer:
infinity-nomic_1  |          /data
infinity-nomic_1  | WARNING  2024-04-05 08:52:42,469                   modeling_hf_nomic_bert.py:357
infinity-nomic_1  |          transformers_modules.data.modeling_hf_nom
infinity-nomic_1  |          ic_bert WARNING: <All keys matched
infinity-nomic_1  |          successfully>
infinity-nomic_1  | INFO     2024-04-05 08:52:42,536                      SentenceTransformer.py:213
infinity-nomic_1  |          sentence_transformers.SentenceTransformer
infinity-nomic_1  |          INFO: Use pytorch device_name: cpu
infinity-nomic_1  | INFO     2024-04-05 08:52:42,560 infinity_emb INFO: Adding    acceleration.py:17
infinity-nomic_1  |          optimizations via Huggingface optimum.
infinity-nomic_1  | ERROR    2024-04-05 08:52:42,562 infinity_emb ERROR:          acceleration.py:21
infinity-nomic_1  |          BetterTransformer is not available for model. The
infinity-nomic_1  |          model type nomic_bert is not yet supported to be
infinity-nomic_1  |          used with BetterTransformer. Feel free to open an
infinity-nomic_1  |          issue at
infinity-nomic_1  |          https://github.com/huggingface/optimum/issues if you
infinity-nomic_1  |          would like this model type to be supported.
infinity-nomic_1  |          Currently supported models are: dict_keys(['albert',
infinity-nomic_1  |          'bark', 'bart', 'bert', 'bert-generation',
infinity-nomic_1  |          'blenderbot', 'bloom', 'camembert', 'blip-2',
infinity-nomic_1  |          'clip', 'codegen', 'data2vec-text', 'deit',
infinity-nomic_1  |          'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2',
infinity-nomic_1  |          'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm',
infinity-nomic_1  |          'm2m_100', 'marian', 'markuplm', 'mbart', 'opt',
infinity-nomic_1  |          'pegasus', 'rembert', 'prophetnet', 'roberta',
infinity-nomic_1  |          'roc_bert', 'roformer', 'splinter', 'tapas', 't5',
infinity-nomic_1  |          'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2',
infinity-nomic_1  |          'xlm-roberta', 'yolos']).. Continue without
infinity-nomic_1  |          bettertransformer modeling code.
infinity-nomic_1  |          Traceback (most recent call last):
infinity-nomic_1  |            File
infinity-nomic_1  |          "/app/infinity_emb/transformer/acceleration.py",
infinity-nomic_1  |          line 19, in to_bettertransformer
infinity-nomic_1  |              model = BetterTransformer.transform(model)
infinity-nomic_1  |            File "/usr/lib/python3.10/contextlib.py", line 79,
infinity-nomic_1  |          in inner
infinity-nomic_1  |              return func(*args, **kwds)
infinity-nomic_1  |            File
infinity-nomic_1  |          "/app/.venv/lib/python3.10/site-packages/optimum/bet
infinity-nomic_1  |          tertransformer/transformation.py", line 234, in
infinity-nomic_1  |          transform
infinity-nomic_1  |              raise NotImplementedError(
infinity-nomic_1  |          NotImplementedError: The model type nomic_bert is
infinity-nomic_1  |          not yet supported to be used with BetterTransformer.
infinity-nomic_1  |          Feel free to open an issue at
infinity-nomic_1  |          https://github.com/huggingface/optimum/issues if you
infinity-nomic_1  |          would like this model type to be supported.
infinity-nomic_1  |          Currently supported models are: dict_keys(['albert',
infinity-nomic_1  |          'bark', 'bart', 'bert', 'bert-generation',
infinity-nomic_1  |          'blenderbot', 'bloom', 'camembert', 'blip-2',
infinity-nomic_1  |          'clip', 'codegen', 'data2vec-text', 'deit',
infinity-nomic_1  |          'distilbert', 'electra', 'ernie', 'fsmt', 'gpt2',
infinity-nomic_1  |          'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm',
infinity-nomic_1  |          'm2m_100', 'marian', 'markuplm', 'mbart', 'opt',
infinity-nomic_1  |          'pegasus', 'rembert', 'prophetnet', 'roberta',
infinity-nomic_1  |          'roc_bert', 'roformer', 'splinter', 'tapas', 't5',
infinity-nomic_1  |          'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2',
infinity-nomic_1  |          'xlm-roberta', 'yolos']).
infinity-nomic_1  | ERROR:    Traceback (most recent call last):
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 677, in lifespan
infinity-nomic_1  |     async with self.lifespan_context(app) as maybe_state:
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 566, in __aenter__
infinity-nomic_1  |     await self._router.startup()
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 654, in startup
infinity-nomic_1  |     await handler()
infinity-nomic_1  |   File "/app/infinity_emb/infinity_server.py", line 62, in _startup
infinity-nomic_1  |     app.model = AsyncEmbeddingEngine.from_args(engine_args)
infinity-nomic_1  |   File "/app/infinity_emb/engine.py", line 49, in from_args
infinity-nomic_1  |     engine = cls(**asdict(engine_args), _show_deprecation_warning=False)
infinity-nomic_1  |   File "/app/infinity_emb/engine.py", line 40, in __init__
infinity-nomic_1  |     self._model, self._min_inference_t, self._max_inference_t = select_model(
infinity-nomic_1  |   File "/app/infinity_emb/inference/select_model.py", line 68, in select_model
infinity-nomic_1  |     loaded_engine.warmup(batch_size=engine_args.batch_size, n_tokens=1)
infinity-nomic_1  |   File "/app/infinity_emb/transformer/abstract.py", line 55, in warmup
infinity-nomic_1  |     return run_warmup(self, inp)
infinity-nomic_1  |   File "/app/infinity_emb/transformer/abstract.py", line 105, in run_warmup
infinity-nomic_1  |     embed = model.encode_core(feat)
infinity-nomic_1  |   File "/app/infinity_emb/transformer/embedder/sentence_transformer.py", line 97, in encode_core
infinity-nomic_1  |     out_features = self.forward(features)["sentence_embedding"]
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
infinity-nomic_1  |     input = module(input)
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
infinity-nomic_1  |     return self._call_impl(*args, **kwargs)
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
infinity-nomic_1  |     return forward_call(*args, **kwargs)
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 98, in forward
infinity-nomic_1  |     output_states = self.auto_model(**trans_features, return_dict=False)
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
infinity-nomic_1  |     return self._call_impl(*args, **kwargs)
infinity-nomic_1  |   File "/app/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
infinity-nomic_1  |     return forward_call(*args, **kwargs)
infinity-nomic_1  | TypeError: NomicBertModel.forward() got an unexpected keyword argument 'return_dict'

@michaelfeil michaelfeil changed the title SFR-Embedding-Mistral not supported Nor nomic Error in offline mode with trust_remote code: SFR-Embedding-Mistral not supported Nor nomic Apr 6, 2024
@michaelfeil
Copy link
Owner

Okay, I have shown above that it is possible to run infinity with nomic. Therefore I'll do the following:

Try running again with this commands. Also delete all of your preexisting huggingface_hub modules and set a explicit commit. nomic runs with custom modeling code, so be aware that not pinning a specific version will lead to the fact that you execute whatever code from them in any future version.

python3 -m venv venv
source ./venv/bin/activate
pip install infinity_emb[all]
pip install einops # einops is a package required just by the custom code of nomic.
infinity_emb --model-name-or-path nomic-ai/nomic-embed-text-v1.5 --revision some_specfic_revision

#195 I'll plan to make it easier to "bake in a model in a dockerfile" - to many people have had issues with that, and it requires to much knowledge into compatible huggingface_hub / sentence_transformers versions, cache path etc. Perhaps give it a try once its merged.

@michaelfeil michaelfeil changed the title Error in offline mode with trust_remote code: SFR-Embedding-Mistral not supported Nor nomic Error in offline mode with trust_remote code: SFR-Embedding-Mistral and nomic does not work without einops Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants