Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core[patch]: Agent Observations should be cast to string to avoid errors #26794

Merged

Conversation

keenborder786
Copy link
Contributor

  • Description: When the user calls astream_events for AgentExecutor, it executes __aiter__, and for every chunk it gets from agent_executor, it calls its messages property. However, when the messages property of AgentStep is called, the following function is invoked: _convert_agent_observation_to_messages, which converts observations to HumanMessage. The problem is that HumanMessage only expects a string in its content, which was not happening, and therefore the error was raised, as pointed out in the issue below.

Copy link

vercel bot commented Sep 23, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Sep 28, 2024 11:45am

@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Sep 23, 2024
@keenborder786
Copy link
Contributor Author

@eyurtsev

@dosubot dosubot bot added Ɑ: core Related to langchain-core 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Sep 23, 2024
return [HumanMessage(content=observation)]
if not isinstance(observation, str):
try:
content = json.dumps(observation, ensure_ascii=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment to explain why json.dumps is chosen as the default strategy vs. str? ( I'm actually not sure which one should be used -- is there a good way to think about this?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay let me think about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw i added these gists to repro the issue

#26763 (comment)

this issue happens with tools with dict when agent was created with create_react_agent https://gist.github.com/sharrajesh/765c0b6edfe991363675f45d467e3c93

but does not happen with other variant of agent creation create_openai_tools_agent which i cannot change without introducing risk
https://gist.github.com/sharrajesh/1080af5a95ae9d7b83a8da46597b68e1

i can give you branch a try to make sure it really works if it helps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes don't worry @sharrajesh all of these will get resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Team, when you do expect this fix will be merged?

@eyurtsev eyurtsev self-assigned this Sep 23, 2024
@keenborder786
Copy link
Contributor Author

@eyurtsev I ran the following code:

import asyncio
import time
import os
import random
import warnings
from typing import Type, Dict, Any

from langchain.agents import create_react_agent, AgentExecutor
from langchain.prompts import PromptTemplate
from langchain.tools import BaseTool
from langchain_openai import ChatOpenAI
from langsmith import Client
from pydantic import BaseModel, Field

# Suppress warnings
warnings.filterwarnings("ignore")

model = ChatOpenAI()


class CatHidingInput(BaseModel):
    query: str = Field(default="", description="Query about where the cat is hiding")


class CatHidingTool(BaseTool):
    name: str = "where_cat_is_hiding"
    description:str = "Use this tool to find out where the cat is hiding right now."
    args_schema: Type[BaseModel] = CatHidingInput

    async def _arun(self, query: str) -> str:
        return random.choice(["under the bed", "on the shelf"])

    def _run(self, query: str) -> str:
        return random.choice(["under the bed", "on the shelf"])


class GetItemsInput(BaseModel):
    place: str = Field(..., description="The place to look for items")


class GetItemsTool(BaseTool):
    name:str = "get_items"
    description:str = "Use this tool to look up which items are in the given place."
    args_schema: Type[BaseModel] = GetItemsInput

    async def _arun(self, place: str) -> Dict[str, Any]:
        return self._get_items(place)

    def _run(self, place: str) -> Dict[str, Any]:
        return self._get_items(place)

    def _get_items(self, place: str):
        items = ""
        if "bed" in place:
            items = "socks, shoes and dust bunnies"
        elif "shelf" in place:
            items = "books, pencils and pictures"
        else:
            items = "cat snacks"
        answer = f"The items in the {place} are: {items}"
        source_documents = [
            {
                "page_content": f"Items found in {place}: {items}",
                "metadata": {"source": "GetItemsTool", "place": place},
            }
        ]
        return {"answer": answer, "source_documents": source_documents}
        


client = Client()

REACT_PROMPT = PromptTemplate.from_template(
    """You are an AI assistant helping to find a cat and items in its location.
Human: {input}
AI: To solve this task, I have access to the following tools:
{tools}
The available tool names are: {tool_names}
Let's approach this step-by-step:
Always use the following format:
Thought: Consider what to do next
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
{agent_scratchpad}"""
)


async def run_agent_streaming():
    tools = [GetItemsTool(), CatHidingTool()]
    agent = create_react_agent(model, tools, REACT_PROMPT)
    agent_executor = AgentExecutor(
        agent=agent,
        tools=tools,
        return_intermediate_steps=True,
        return_source_documents=True,
        handle_parsing_errors=True,
    ).with_config({"run_name": "Agent"})
    async for event in agent_executor.astream_events(
        {"input": "where is the cat hiding? what items are in that location?"},
        version="v2",
    ):
        kind = event["event"]


total_time = 0
if __name__ == "__main__":
    for _ in range(10):
        model = ChatOpenAI()
        start_time = time.time()
        asyncio.run(run_agent_streaming())
        end_time = time.time()
        print('Total Time Taken with `json.dumps`',end_time-start_time)
        total_time+=(end_time-start_time)
    print(f'Average Time with `json.dumps` over 10 runs: {total_time/10}')

with both str and json.dumps. json.dumps conversation is consistently lower in terms of time taken:

json.dumps

Total Time Taken with `json.dumps` 4.2268967628479
Total Time Taken with `json.dumps` 3.457839012145996
Total Time Taken with `json.dumps` 3.6213977336883545
Total Time Taken with `json.dumps` 3.8242459297180176
Total Time Taken with `json.dumps` 3.0543370246887207
Total Time Taken with `json.dumps` 3.5508298873901367
Total Time Taken with `json.dumps` 3.437480926513672
Total Time Taken with `json.dumps` 4.191598176956177
Total Time Taken with `json.dumps` 3.411051034927368
Total Time Taken with `json.dumps` 3.6134040355682373

str

Total Time Taken with `str` 3.4296929836273193
Total Time Taken with `str` 11.150007963180542
Total Time Taken with `str` 4.906854867935181
Total Time Taken with `str` 4.535444974899292
Total Time Taken with `str` 3.908177137374878
Total Time Taken with `str` 8.957874059677124
Total Time Taken with `str` 4.169111013412476
Total Time Taken with `str` 3.7205328941345215
Total Time Taken with `str` 3.4452409744262695
Total Time Taken with `str` 4.0477330684661865

Average Time take str vs json.dumps over 10 runs:

  • str => Average Time with str over 10 runs: 5.227066993713379
  • json.dumps => Average Time with json.dumps over 10 runs: 3.638908052444458

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Sep 24, 2024
@eyurtsev
Copy link
Collaborator

@keenborder786 benchmarking total time isn't relevant (at least not directly). If the encoding affected the number of cycles that the agent had to do because it affected the chat models ability to correctly use the encoded output, then sure that's a relevant consideration, but that's not what's measured in these benchmarks.

The way to think about json.dumps vs. str is based on: For things typically returned from tools, how should that output be cast into string format to be best consumed by a chat model. This mean that the chat model would need to act on that information correctly.

Why did you chose json.dumps for this PR? Is there another part in langchain code that has similar logic?


As an side: The benchmarks themselves are completely dominated by noise from fluctuations in chat model run times.

If you care you can verify this by calculating the standard error of the mean (which is the standard deviation divded by the square root of the number of observations) -- https://en.wikipedia.org/wiki/Standard_error, it'll give you a sense about the uncertainty in our measurement of the mean.

@sharrajesh
Copy link
Contributor

please also try running create_react_agent version of this code which is causing the issue

https://gist.github.com/sharrajesh/765c0b6edfe991363675f45d467e3c93

@keenborder786
Copy link
Contributor Author

@keenborder786 benchmarking total time isn't relevant (at least not directly). If the encoding affected the number of cycles that the agent had to do because it affected the chat models ability to correctly use the encoded output, then sure that's a relevant consideration, but that's not what's measured in these benchmarks.

The way to think about json.dumps vs. str is based on: For things typically returned from tools, how should that output be cast into string format to be best consumed by a chat model. This mean that the chat model would need to act on that information correctly.

Why did you chose json.dumps for this PR? Is there another part in langchain code that has similar logic?

As an side: The benchmarks themselves are completely dominated by noise from fluctuations in chat model run times.

If you care you can verify this by calculating the standard error of the mean (which is the standard deviation divded by the square root of the number of observations) -- https://en.wikipedia.org/wiki/Standard_error, it'll give you a sense about the uncertainty in our measurement of the mean.

@eyurtsev okay

@keenborder786
Copy link
Contributor Author

keenborder786 commented Sep 28, 2024

@keenborder786 benchmarking total time isn't relevant (at least not directly). If the encoding affected the number of cycles that the agent had to do because it affected the chat models ability to correctly use the encoded output, then sure that's a relevant consideration, but that's not what's measured in these benchmarks.
The way to think about json.dumps vs. str is based on: For things typically returned from tools, how should that output be cast into string format to be best consumed by a chat model. This mean that the chat model would need to act on that information correctly.
Why did you chose json.dumps for this PR? Is there another part in langchain code that has similar logic?
As an side: The benchmarks themselves are completely dominated by noise from fluctuations in chat model run times.
If you care you can verify this by calculating the standard error of the mean (which is the standard deviation divded by the square root of the number of observations) -- https://en.wikipedia.org/wiki/Standard_error, it'll give you a sense about the uncertainty in our measurement of the mean.

@eyurtsev .

  • You are right but then I guess benchmarking is actually futile, because if our concern is how chat model would need to act on that information correctly , then we can always use json.dumps, and if it throws an error then we can use str to cast the obervation.
  • Yes I choosed json.dumps because of _create_function_message but without the exception but I guess we can use the exact same logic when casting (I just made a commit, please see):
    content = json.dumps(observation, ensure_ascii=False)

@eyurtsev
Copy link
Collaborator

Yes I choosed json.dumps because of _create_function_message but without the exception but I guess we can use the exact same logic when casting (I just made a commit, please see):

Got it -- I didn't see it from the github diff

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Sep 30, 2024
@eyurtsev eyurtsev merged commit e12f570 into langchain-ai:master Sep 30, 2024
85 checks passed
@eyurtsev eyurtsev changed the title [Fix]: Agent Observations should be casted to string to avoid errors core[patch]: Agent Observations should be cast to string to avoid errors Sep 30, 2024
Sheepsta300 pushed a commit to Sheepsta300/langchain that referenced this pull request Oct 1, 2024
* [chore]: Agent Observation should be casted to string to avoid errors

* Merge branch 'master' into fix_observation_type_streaming

* [chore]: Using Json.dumps

* [chore]: Exact same logic as  when casting agent oobservation to string
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: core Related to langchain-core lgtm PR looks good. Use to confirm that a PR is ready for merging. size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants