The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey #847

ShellLM · 2024-07-25T11:20:17Z

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey

Tula Masterman, Sandi Besen, Mason Sawtell*, Alex Chao

*Denotes Equal Contribution

Abstract

This survey paper examines the recent advancements in AI agent implementations, with a focus on their ability to achieve complex goals that require enhanced reasoning, planning, and tool execution capabilities. The primary objectives of this work are to a) communicate the current capabilities and limitations of existing AI agent implementations, b) share insights gained from our observations of these systems in action, and c) suggest important considerations for future developments in AI agent design. We achieve this by providing overviews of single-agent and multi-agent architectures, identifying key patterns and divergences in design choices, and evaluating their overall impact on accomplishing a provided goal. Our contribution outlines key themes when selecting an agentic architecture, the impact of leadership on agent systems, agent communication styles, and key phases for planning, execution, and reflection that enable robust AI agent systems.

Keywords: AI Agent, Agent Architecture, AI Reasoning, Planning, Tool Calling, Single Agent, Multi Agent, Agent Survey, LLM Agent, Autonomous Agent

1 Introduction

Since the launch of ChatGPT, many of the first wave of generative AI applications have been a variation of a chat over a corpus of documents using the Retrieval Augmented Generation (RAG) pattern. While there is a lot of activity in making RAG systems more robust, various groups are starting to build what the next generation of AI applications will look like, centralizing on a common theme: agents.

Beginning with investigations into recent foundation models like GPT-4 and popularized through open-source projects like AutoGPT and BabyAGI, the research community has experimented with building autonomous agent-based systems [nakajima_yoheinakajimababyagi_2024, birr_autogptp_2024].

As opposed to zero-shot prompting of a large language model where a user types into an open-ended text field and gets a result without additional input, agents allow for more complex interaction and orchestration. In particular, agentic systems have a notion of planning, loops, reflection and other control structures that heavily leverage the model's inherent reasoning capabilities to accomplish a task end-to-end. Paired with the ability to use tools, plugins, and function calling, agents are empowered to do more general-purpose work.

Among the community, there is a current debate on whether single or multi-agent systems are best suited for solving complex tasks. While single agent architectures excel when problems are well-defined and feedback from other agent-personas or the user is not needed, multi-agent architectures tend to thrive more when collaboration and multiple distinct execution paths are required.

1.1 Taxonomy

Agents: AI agents are language model-powered entities able to plan and take actions to execute goals over multiple iterations. AI agent architectures are either comprised of a single agent or multiple agents working together to solve a problem.

Typically, each agent is given a persona and access to a variety of tools that will help them accomplish their job either independently or as part of a team. Some agents also contain a memory component, where they can save and load information outside of their messages and prompts. In this paper, we follow the definition of agent that consists of "brain, perception, and action" [xi2023rise]. These components satisfy the minimum requirements for agents to understand, reason, and act on the environment around them.

Agent Persona: An agent persona describes the role or personality that the agent should take on, including any other instructions specific to that agent. Personas also contain descriptions of any tools the agent has access to. They make the agent aware of their role, the purpose of their tools, and how to leverage them effectively. Researchers have found that "shaped personality verifiably influences Large Language Model (LLM) behavior in common downstream (i.e. subsequent) tasks, such as writing social media posts" [serapiogarcía2023personality]. Solutions that use multiple agent personas to solve problems also show significant improvements compared to Chain-of-Thought (CoT) prompting where the model is asked to break down its plans step by step [wang2024unleashing, wei_chain--thought_2023].

Tools: In the context of AI agents, tools represent any functions that the model can call. They allow the agent to interact with external data sources by pulling or pushing information to that source. An example of an agent persona and associated tools is a professional contract writer. The writer is given a persona explaining their role and the types of tasks it must accomplish. It is also given tools related to adding notes to a document, reading an existing document, or sending an email with a final draft.

Single Agent Architectures: These architectures are powered by one language model and will perform all the reasoning, planning, and tool execution on their own. The agent is given a system prompt and any tools required to complete their task. In single agent patterns there is no feedback mechanism from other AI agents; however, there may be options for humans to provide feedback that guides the agent.

Multi-Agent Architectures: These architectures involve two or more agents, where each agent can utilize the same language model or a set of different language models. The agents may have access to the same tools or different tools. Each agent typically has their own persona.

Multi-agent architectures can have a wide variety of organizations at any level of complexity. In this paper, we divide them into two primary categories: vertical and horizontal. It is important to keep in mind that these categories represent two ends of a spectrum, where most existing architectures fall somewhere between these two extremes.

Vertical Architectures: In this structure, one agent acts as a leader and has other agents report directly to them. Depending on the architecture, reporting agents may communicate exclusively with the lead agent. Alternatively, a leader may be defined with a shared conversation between all agents. The defining features of vertical architectures include having a lead agent and a clear division of labor between the collaborating agents.

Horizontal Architectures: In this structure, all the agents are treated as equals and are part of one group discussion about the task. Communication between agents occurs in a shared thread where each agent can see all messages from the others. Agents also can volunteer to complete certain tasks or call tools, meaning they do not need to be assigned by a leading agent. Horizontal architectures are generally used for tasks where collaboration, feedback and group discussion are key to the overall success of the task [chen_agentverse_2023].

2 Key Considerations for Effective Agents

2.1 Overview

Agents are designed to extend language model capabilities to solve real-world challenges. Successful implementations require robust problem-solving capabilities enabling agents to perform well on novel tasks. To solve real-world problems effectively, agents require the ability to reason and plan as well as call tools that interact with an external environment. In this section we explore why reasoning, planning, and tool calling are critical to agent success.

2.2 The Importance of Reasoning and Planning

Reasoning is a fundamental building block of human cognition, enabling people to make decisions, solve problems, and understand the world around us. AI agents need a strong ability to reason if they are to effectively interact with complex environments, make autonomous decisions, and assist humans in a wide range of tasks. This tight synergy between "acting" and "reasoning" allows new tasks to be learned quickly and enables robust decision making or reasoning, even under previously unseen circumstances or information uncertainties [yao_react_2023]. Additionally, agents need reasoning to adjust their plans based on new feedback or information learned.

If agents lacking reasoning skills are tasked with acting on straightforward tasks, they may misinterpret the query, generate a response based on a literal understanding, or fail to consider multi-step implications.

Planning, which requires strong reasoning abilities, commonly falls into one of five major approaches: task decomposition, multi-plan selection, external module-aided planning, reflection and refinement and memory-augmented planning [huang2024understanding]. These approaches allow the model to either break the task down into sub tasks, select one plan from many generated options, leverage a preexisting external plan, revise previous plans based on new information, or leverage external information to improve the plan.

Most agent patterns have a dedicated planning step which invokes one or more of these techniques to create a plan before any actions are executed. For example, Plan Like a Graph (PLaG) is an approach that represents plans as directed graphs, with multiple steps being executed in parallel [lin_graph-enhanced_2024, yao_tree_2023]. This can provide a significant performance increase over other methods on tasks that contain many independent subtasks that benefit from asynchronous execution.

2.3 The Importance of Effective Tool Calling

One key benefit of the agent abstraction over prompting base language models is the agents' ability to solve complex problems by calling multiple tools. These tools enable the agent to interact with external data sources, send or retrieve information from existing APIs, and more. Problems that require extensive tool calling often go hand in hand with those that require complex reasoning.

Both single-agent and multi-agent architectures can be used to solve challenging tasks by employing reasoning and tool calling steps. Many methods use multiple iterations of reasoning, memory, and reflection to effectively and accurately complete problems [liu_llm_2024, shinn_reflexion_2023, yao_react_2023]. They often do this by breaking a larger problem into smaller subproblems, and then solving each one with the appropriate tools in sequence.

Other works focused on advancing agent patterns highlight that while breaking a larger problem into smaller subproblems can be effective at solving complex tasks, single agent patterns often struggle to complete the long sequence required [shi_learning_2024, gao_efficient_2024].

Multi-agent patterns can address the issues of parallel tasks and robustness since individual agents can work on individual subproblems. Many multi-agent patterns start by taking a complex problem and breaking it down into several smaller tasks. Then, each agent works independently on solving each task using their own independent set of tools.

https://arxiv.org/html/2404.11584v1

Suggested labels

None

ShellLM · 2024-07-25T11:20:18Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey #847

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey #847

ShellLM commented Jul 25, 2024

ShellLM commented Jul 25, 2024

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey #847

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey #847

Comments

ShellLM commented Jul 25, 2024

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey

Tula Masterman*, Sandi Besen*, Mason Sawtell*, Alex Chao

Abstract

1 Introduction

1.1 Taxonomy

2 Key Considerations for Effective Agents

2.1 Overview

2.2 The Importance of Reasoning and Planning

2.3 The Importance of Effective Tool Calling

Suggested labels

None

ShellLM commented Jul 25, 2024

Related content

Tula Masterman, Sandi Besen, Mason Sawtell*, Alex Chao