core[patch]: Agent Observations should be cast to string to avoid errors #26794

keenborder786 · 2024-09-23T21:57:11Z

Description: When the user calls astream_events for AgentExecutor, it executes __aiter__, and for every chunk it gets from agent_executor, it calls its messages property. However, when the messages property of AgentStep is called, the following function is invoked: _convert_agent_observation_to_messages, which converts observations to HumanMessage. The problem is that HumanMessage only expects a string in its content, which was not happening, and therefore the error was raised, as pointed out in the issue below.
- Issue: Bug Report: astream_events fails when tool returns a dictionary #26763

vercel · 2024-09-23T21:57:15Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		Sep 28, 2024 11:45am

keenborder786 · 2024-09-23T21:57:20Z

@eyurtsev

eyurtsev · 2024-09-23T22:00:17Z

libs/core/langchain_core/agents.py

-        return [HumanMessage(content=observation)]
+        if not isinstance(observation, str):
+            try:
+                content = json.dumps(observation, ensure_ascii=False)


Could you add a comment to explain why json.dumps is chosen as the default strategy vs. str? ( I'm actually not sure which one should be used -- is there a good way to think about this?)

okay let me think about this.

fwiw i added these gists to repro the issue

#26763 (comment)

this issue happens with tools with dict when agent was created with create_react_agent https://gist.github.com/sharrajesh/765c0b6edfe991363675f45d467e3c93

but does not happen with other variant of agent creation create_openai_tools_agent which i cannot change without introducing risk
https://gist.github.com/sharrajesh/1080af5a95ae9d7b83a8da46597b68e1

i can give you branch a try to make sure it really works if it helps?

Yes don't worry @sharrajesh all of these will get resolved.

Team, when you do expect this fix will be merged?

keenborder786 · 2024-09-24T15:28:40Z

@eyurtsev I ran the following code:

import asyncio
import time
import os
import random
import warnings
from typing import Type, Dict, Any

from langchain.agents import create_react_agent, AgentExecutor
from langchain.prompts import PromptTemplate
from langchain.tools import BaseTool
from langchain_openai import ChatOpenAI
from langsmith import Client
from pydantic import BaseModel, Field

# Suppress warnings
warnings.filterwarnings("ignore")

model = ChatOpenAI()


class CatHidingInput(BaseModel):
    query: str = Field(default="", description="Query about where the cat is hiding")


class CatHidingTool(BaseTool):
    name: str = "where_cat_is_hiding"
    description:str = "Use this tool to find out where the cat is hiding right now."
    args_schema: Type[BaseModel] = CatHidingInput

    async def _arun(self, query: str) -> str:
        return random.choice(["under the bed", "on the shelf"])

    def _run(self, query: str) -> str:
        return random.choice(["under the bed", "on the shelf"])


class GetItemsInput(BaseModel):
    place: str = Field(..., description="The place to look for items")


class GetItemsTool(BaseTool):
    name:str = "get_items"
    description:str = "Use this tool to look up which items are in the given place."
    args_schema: Type[BaseModel] = GetItemsInput

    async def _arun(self, place: str) -> Dict[str, Any]:
        return self._get_items(place)

    def _run(self, place: str) -> Dict[str, Any]:
        return self._get_items(place)

    def _get_items(self, place: str):
        items = ""
        if "bed" in place:
            items = "socks, shoes and dust bunnies"
        elif "shelf" in place:
            items = "books, pencils and pictures"
        else:
            items = "cat snacks"
        answer = f"The items in the {place} are: {items}"
        source_documents = [
            {
                "page_content": f"Items found in {place}: {items}",
                "metadata": {"source": "GetItemsTool", "place": place},
            }
        ]
        return {"answer": answer, "source_documents": source_documents}
        


client = Client()

REACT_PROMPT = PromptTemplate.from_template(
    """You are an AI assistant helping to find a cat and items in its location.
Human: {input}
AI: To solve this task, I have access to the following tools:
{tools}
The available tool names are: {tool_names}
Let's approach this step-by-step:
Always use the following format:
Thought: Consider what to do next
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
{agent_scratchpad}"""
)


async def run_agent_streaming():
    tools = [GetItemsTool(), CatHidingTool()]
    agent = create_react_agent(model, tools, REACT_PROMPT)
    agent_executor = AgentExecutor(
        agent=agent,
        tools=tools,
        return_intermediate_steps=True,
        return_source_documents=True,
        handle_parsing_errors=True,
    ).with_config({"run_name": "Agent"})
    async for event in agent_executor.astream_events(
        {"input": "where is the cat hiding? what items are in that location?"},
        version="v2",
    ):
        kind = event["event"]


total_time = 0
if __name__ == "__main__":
    for _ in range(10):
        model = ChatOpenAI()
        start_time = time.time()
        asyncio.run(run_agent_streaming())
        end_time = time.time()
        print('Total Time Taken with `json.dumps`',end_time-start_time)
        total_time+=(end_time-start_time)
    print(f'Average Time with `json.dumps` over 10 runs: {total_time/10}')

with both str and json.dumps. json.dumps conversation is consistently lower in terms of time taken:

json.dumps

Total Time Taken with `json.dumps` 4.2268967628479
Total Time Taken with `json.dumps` 3.457839012145996
Total Time Taken with `json.dumps` 3.6213977336883545
Total Time Taken with `json.dumps` 3.8242459297180176
Total Time Taken with `json.dumps` 3.0543370246887207
Total Time Taken with `json.dumps` 3.5508298873901367
Total Time Taken with `json.dumps` 3.437480926513672
Total Time Taken with `json.dumps` 4.191598176956177
Total Time Taken with `json.dumps` 3.411051034927368
Total Time Taken with `json.dumps` 3.6134040355682373

str

Total Time Taken with `str` 3.4296929836273193
Total Time Taken with `str` 11.150007963180542
Total Time Taken with `str` 4.906854867935181
Total Time Taken with `str` 4.535444974899292
Total Time Taken with `str` 3.908177137374878
Total Time Taken with `str` 8.957874059677124
Total Time Taken with `str` 4.169111013412476
Total Time Taken with `str` 3.7205328941345215
Total Time Taken with `str` 3.4452409744262695
Total Time Taken with `str` 4.0477330684661865

Average Time take `str` vs `json.dumps` over 10 runs:

str => Average Time with str over 10 runs: 5.227066993713379
json.dumps => Average Time with json.dumps over 10 runs: 3.638908052444458

eyurtsev · 2024-09-24T15:48:53Z

@keenborder786 benchmarking total time isn't relevant (at least not directly). If the encoding affected the number of cycles that the agent had to do because it affected the chat models ability to correctly use the encoded output, then sure that's a relevant consideration, but that's not what's measured in these benchmarks.

The way to think about json.dumps vs. str is based on: For things typically returned from tools, how should that output be cast into string format to be best consumed by a chat model. This mean that the chat model would need to act on that information correctly.

Why did you chose json.dumps for this PR? Is there another part in langchain code that has similar logic?

As an side: The benchmarks themselves are completely dominated by noise from fluctuations in chat model run times.

If you care you can verify this by calculating the standard error of the mean (which is the standard deviation divded by the square root of the number of observations) -- https://en.wikipedia.org/wiki/Standard_error, it'll give you a sense about the uncertainty in our measurement of the mean.

sharrajesh · 2024-09-24T17:17:35Z

please also try running create_react_agent version of this code which is causing the issue

https://gist.github.com/sharrajesh/765c0b6edfe991363675f45d467e3c93

keenborder786 · 2024-09-28T11:28:05Z

@keenborder786 benchmarking total time isn't relevant (at least not directly). If the encoding affected the number of cycles that the agent had to do because it affected the chat models ability to correctly use the encoded output, then sure that's a relevant consideration, but that's not what's measured in these benchmarks.

The way to think about json.dumps vs. str is based on: For things typically returned from tools, how should that output be cast into string format to be best consumed by a chat model. This mean that the chat model would need to act on that information correctly.

Why did you chose json.dumps for this PR? Is there another part in langchain code that has similar logic?

As an side: The benchmarks themselves are completely dominated by noise from fluctuations in chat model run times.

If you care you can verify this by calculating the standard error of the mean (which is the standard deviation divded by the square root of the number of observations) -- https://en.wikipedia.org/wiki/Standard_error, it'll give you a sense about the uncertainty in our measurement of the mean.

@eyurtsev okay

keenborder786 · 2024-09-28T11:44:08Z

@keenborder786 benchmarking total time isn't relevant (at least not directly). If the encoding affected the number of cycles that the agent had to do because it affected the chat models ability to correctly use the encoded output, then sure that's a relevant consideration, but that's not what's measured in these benchmarks.
The way to think about json.dumps vs. str is based on: For things typically returned from tools, how should that output be cast into string format to be best consumed by a chat model. This mean that the chat model would need to act on that information correctly.
Why did you chose json.dumps for this PR? Is there another part in langchain code that has similar logic?
As an side: The benchmarks themselves are completely dominated by noise from fluctuations in chat model run times.
If you care you can verify this by calculating the standard error of the mean (which is the standard deviation divded by the square root of the number of observations) -- https://en.wikipedia.org/wiki/Standard_error, it'll give you a sense about the uncertainty in our measurement of the mean.

@eyurtsev .

You are right but then I guess benchmarking is actually futile, because if our concern is how chat model would need to act on that information correctly , then we can always use json.dumps, and if it throws an error then we can use str to cast the obervation.
Yes I choosed json.dumps because of _create_function_message but without the exception but I guess we can use the exact same logic when casting (I just made a commit, please see):

langchain/libs/core/langchain_core/agents.py

Line 212 in b2c3159

content = json.dumps(observation, ensure_ascii=False)

eyurtsev · 2024-09-30T19:52:45Z

Yes I choosed json.dumps because of _create_function_message but without the exception but I guess we can use the exact same logic when casting (I just made a commit, please see):

Got it -- I didn't see it from the github diff

* [chore]: Agent Observation should be casted to string to avoid errors * Merge branch 'master' into fix_observation_type_streaming * [chore]: Using Json.dumps * [chore]: Exact same logic as when casting agent oobservation to string

[chore]: Agent Observation should be casted to string to avoid errors

7bb4abc

dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Sep 23, 2024

dosubot bot added Ɑ: core Related to langchain-core 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Sep 23, 2024

eyurtsev reviewed Sep 23, 2024

View reviewed changes

eyurtsev self-assigned this Sep 23, 2024

Merge branch 'master' into fix_observation_type_streaming

3db8d9d

[chore]: Using Json.dumps

1da0986

dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Sep 24, 2024

[chore]: Exact same logic as when casting agent oobservation to string

9ea5a0b

eyurtsev approved these changes Sep 30, 2024

View reviewed changes

dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Sep 30, 2024

eyurtsev merged commit e12f570 into langchain-ai:master Sep 30, 2024
85 checks passed

eyurtsev changed the title ~~[Fix]: Agent Observations should be casted to string to avoid errors~~ core[patch]: Agent Observations should be cast to string to avoid errors Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core[patch]: Agent Observations should be cast to string to avoid errors #26794

core[patch]: Agent Observations should be cast to string to avoid errors #26794

keenborder786 commented Sep 23, 2024

vercel bot commented Sep 23, 2024 •

edited

Loading

keenborder786 commented Sep 23, 2024

eyurtsev Sep 23, 2024

keenborder786 Sep 23, 2024

sharrajesh Sep 24, 2024

keenborder786 Sep 24, 2024

sharrajesh Oct 4, 2024

keenborder786 commented Sep 24, 2024

eyurtsev commented Sep 24, 2024

sharrajesh commented Sep 24, 2024

keenborder786 commented Sep 28, 2024

keenborder786 commented Sep 28, 2024 •

edited

Loading

eyurtsev commented Sep 30, 2024

core[patch]: Agent Observations should be cast to string to avoid errors #26794

core[patch]: Agent Observations should be cast to string to avoid errors #26794

Conversation

keenborder786 commented Sep 23, 2024

vercel bot commented Sep 23, 2024 • edited Loading

keenborder786 commented Sep 23, 2024

eyurtsev Sep 23, 2024

Choose a reason for hiding this comment

keenborder786 Sep 23, 2024

Choose a reason for hiding this comment

sharrajesh Sep 24, 2024

Choose a reason for hiding this comment

keenborder786 Sep 24, 2024

Choose a reason for hiding this comment

sharrajesh Oct 4, 2024

Choose a reason for hiding this comment

keenborder786 commented Sep 24, 2024

json.dumps

str

Average Time take str vs json.dumps over 10 runs:

eyurtsev commented Sep 24, 2024

sharrajesh commented Sep 24, 2024

keenborder786 commented Sep 28, 2024

keenborder786 commented Sep 28, 2024 • edited Loading

eyurtsev commented Sep 30, 2024

vercel bot commented Sep 23, 2024 •

edited

Loading

Average Time take `str` vs `json.dumps` over 10 runs:

keenborder786 commented Sep 28, 2024 •

edited

Loading