ValueError: Pipeline with tokenizer without pad_token cannot do batching. You can try to set it with `pipe.tokenizer.pad_token_id = model.config.eos_token_id`. #24953

ItzBrein · 2024-08-02T02:07:50Z

ItzBrein
Aug 2, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from langchain_huggingface import HuggingFacePipeline
from langchain_huggingface import ChatHuggingFace
from langchain_community.agent_toolkits import create_sql_agent
 
HF_TOKEN = ''
model_id = 'meta-llama/Meta-Llama-3.1-8B-Instruct'
 
llm = HuggingFacePipeline.from_model_id(
    model_id=model_id,
    task="text-generation",
    device=None,
    model_kwargs=dict(
        device_map="auto",
    ),
    pipeline_kwargs=dict(
        token=HF_TOKEN,
        # streaming=True,
        # model parameters, optional, added so it's easier to modify in the future
        temperature=0.6,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max #Default 0.6 Try 0.8
        max_new_tokens=512,  # max number of tokens to generate in the output
        repetition_penalty=1.1  # without this output begins repeating
    ),
)
 
chat_model = ChatHuggingFace(llm=llm)
 
agent_executor = create_sql_agent(chat_model, db=db, handle_parsing_errors=True, verbose=True)
agent_executor.invoke({"input": "What state is in the database?"})

Description

I am trying to use HuggingFacePipeline with ChatHuggingFace. I am expecting the agent to generate a SQL query and answer the question, but instead, I am getting the following error:

File ~/.local/lib/python3.11/site-packages/transformers/pipelines/base.py:146, in pad_collate_fn(tokenizer, feature_extractor)    144 if tokenizer is not None:    145     if tokenizer.pad_token_id is None: --> 146         raise ValueError(    147             "Pipeline with tokenizer without pad_token cannot do batching. You can try to set it with "    148             "pipe.tokenizer.pad_token_id = model.config.eos_token_id`."
   149         )
   150     else:
   151         t_padding_value = tokenizer.pad_token_id

ValueError: Pipeline with tokenizer without pad_token cannot do batching. You can try to set it with pipe.tokenizer.pad_token_id = model.config.eos_token_id.`

System Info

OS: Linux

langchain_core: 0.2.11
langchain: 0.2.6
langchain_community: 0.2.6

@ItzBrein · 2024-08-02T02:15:07Z

dosubot[bot]
bot Aug 2, 2024

Hey @ItzBrein! I'm here to help you with your bug, questions, or anything else while you wait for a human maintainer. I'm a bot that can assist you with bug resolution, answer your questions, and guide you to become a contributor. Let's tackle this together!

To resolve the ValueError: Pipeline with tokenizer without pad_token cannot do batching error, you need to ensure that the tokenizer has a pad_token defined. If the tokenizer does not have a pad_token, you can set it to the eos_token_id of the model's configuration.

Here is the relevant part of the code that addresses this issue:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_id, **model_kwargs)

if tokenizer.pad_token is None:
    tokenizer.pad_token_id = model.config.eos_token_id

You should include this code snippet when setting up your HuggingFacePipeline to ensure that the tokenizer is properly configured to handle batching. Here is how you can integrate it into your existing code:

from langchain_huggingface import HuggingFacePipeline, ChatHuggingFace
from langchain_community.agent_toolkits import create_sql_agent
from transformers import AutoTokenizer, AutoModelForCausalLM

HF_TOKEN = ''
model_id = 'meta-llama/Meta-Llama-3.1-8B-Instruct'

# Load the tokenizer and set pad_token_id if necessary
tokenizer = AutoTokenizer.from_pretrained(model_id)
if tokenizer.pad_token is None:
    tokenizer.pad_token_id = AutoModelForCausalLM.from_pretrained(model_id).config.eos_token_id

llm = HuggingFacePipeline.from_model_id(
    model_id=model_id,
    task="text-generation",
    device=None,
    model_kwargs=dict(
        device_map="auto",
    ),
    pipeline_kwargs=dict(
        token=HF_TOKEN,
        temperature=0.6,
        max_new_tokens=512,
        repetition_penalty=1.1
    ),
)

chat_model = ChatHuggingFace(llm=llm)

agent_executor = create_sql_agent(chat_model, db=db, handle_parsing_errors=True, verbose=True)
agent_executor.invoke({"input": "What state is in the database?"})

This should resolve the error and allow batching to work correctly ^[1]^[2].

To continue talking to Dosu, mention @dosu.

0 replies

Kirushikesh · 2024-09-20T14:12:53Z

Kirushikesh
Sep 20, 2024

@ItzBrein use

chat_model.llm.pipeline.tokenizer.pad_token_id = chat_model.llm.pipeline.tokenizer.eos_token_id

0 replies

dipanjanS · 2024-10-22T20:10:12Z

dipanjanS
Oct 22, 2024

Is there no automated fix for this, why do we have to keep putting all these settings manually everytime.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Pipeline with tokenizer without pad_token cannot do batching. You can try to set it with `pipe.tokenizer.pad_token_id = model.config.eos_token_id`. #24953

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

ValueError: Pipeline with tokenizer without pad_token cannot do batching. You can try to set it with pipe.tokenizer.pad_token_id = model.config.eos_token_id. #24953

ItzBrein Aug 2, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 3 comments

dosubot[bot] bot Aug 2, 2024

Kirushikesh Sep 20, 2024

dipanjanS Oct 22, 2024

ValueError: Pipeline with tokenizer without pad_token cannot do batching. You can try to set it with `pipe.tokenizer.pad_token_id = model.config.eos_token_id`. #24953

ItzBrein
Aug 2, 2024

dosubot[bot]
bot Aug 2, 2024

Kirushikesh
Sep 20, 2024

dipanjanS
Oct 22, 2024