Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1-to-many transitions #359

Open
zilto opened this issue Sep 8, 2024 · 4 comments
Open

1-to-many transitions #359

zilto opened this issue Sep 8, 2024 · 4 comments

Comments

@zilto
Copy link
Collaborator

zilto commented Sep 8, 2024

Currently

Below is a valid Burr graph definition to "get a user input, and select the most appropriate action out of 3"

graph = (
  GraphBuilder()
  .with_actions(
    process_user_input,
    decide_next_action,
    generate_text,
    generate_image,
    ask_for_details,
  )
  .with_transitions(
      ("process_user_input", "decide_next_action"),
      ("decide_next_action", "generate_text"),
      ("decide_next_action", "generate_image", expr("mode==generate_image")),
      ("decide_next_action", "ask_for_details", expr("mode==ask_user_for_details")),
  )
  .build()
)

Notes:

  • Needs a "decide node" that writes a value to state (i.e., mode).
  • The "decide node" is the result of Condition objects requiring to return bool
  • Has one transition per graph edge
  • Requires at least 2 out of 3 explicit condition
  • Conditions are evaluated sequentially

Desired solution

The main benefit of the above is that everything is explicit. But it can also be less ergonomic, add complexity to defining transitions, and be inefficient when computing transitions is expensive.

Consider the following API

ApplicationBuilder()
  .with_actions(
    process_user_input,
    generate_text,
    generate_image,
    ask_user_for_details,
  )
  .with_transitions(
    ("process_user_input", "decide_next_action"),
    (
      "decide_next_action",
      ["generate_text", "generate_image", "ask_user_for_details"],
      OneToManyCondition(...)   # TODO
    ),
  )
  .with_entrypoint("process_user_input")
  .build()
)

Note:

  • No longer requires a "decide node"; this is moved to a OneToManyCondition object. When "resolved", it should return the index or the name of the next action.
  • New transition syntax allowing 1 -> n. This allows to resolve multiple binary conditions at once
  • arguably easier to read and manage code changes. Removing the sequential condition checking simplifies debugging
  • less transitions to define

Use case

The popular use case I have in mind is "use an LLM to decide the next node". Given the Graph building process, it would be possible to dynamically create a model of "available next actions". Here's a sketch using instructor, which has better guarantees regarding structured LLM outputs (for OpenAI at least)

ApplicationBuilder()
  .with_actions(
    process_user_input,
    generate_text,
    generate_image,
    ask_user_for_details,
  )
  .with_transitions(
    ("process_user_input", "decide_next_action"),
    (
      "decide_next_action",
      ["generate_text", "generate_image", "ask_user_for_details"],
      LLMDecider()
    ),
  )
  .with_entrypoint("process_user_input")
  .build()
)
def create_decision_model(tos: list[FunctionBasedAction]):
    next_action_names = []
    description = ""
    for to in tos:
        if not to.__doc__:
            raise ValueError(f"LLMDecider: {to.name} needs to have a non-empty docstring.")
        
        next_action_names.append(to.name)
        description += f"{to.name}\n{to.__doc__}\n\n"

    return create_model(
        "next_action",
        next_action=(
            Literal[tuple(next_action_names)],
            Field(description="AVAILABLE ACTIONS\n\n"+next_action_descriptions)
        )
    )
 
 def _llm_resolver(state: State, llm_client, response_model) -> str:
    user_query = state["user_query"]

    next_action = llm_client.chat.completions.create(
        model="gpt-4o.mini",
        response_model=response_model,
        messages=[
            {"role": "system", "content": "You are an automated agent and you need to decide the next action to take to fulfill the user query."},
            {"role": "user", "content": user_query}
        ]
    )

    return next_action
 
 # apply something along those lines
 condition = Condition(keys=["user_query"], resolver=partial(_llm_resolver, llm_client=...,  response_model=create_decision_model)
@elijahbenizzy
Copy link
Contributor

Yep, this is clean. OneToMany -> Select maybe?

Current transition = if/else
Select = switch statement

@skrawcz
Copy link
Contributor

skrawcz commented Sep 9, 2024

and be inefficient when computing transitions is expensive.

Can you clarify? I don't understand that comment.

No longer requires a "decide node";

I'm not sure I follow this, you still have a decide_next_action in your example? Isn't that the same as before?

LLMDecider()

I don't follow your code above, where is this defined?

Here's a sketch using instructor, which has better guarantees regarding structured LLM outputs (for OpenAI at least)

instructor is just an implementation to get the LLM to output something structured. I'm not sure how it's relevant here? You can use that same instructor call in the body of an action.

If I'm understanding correctly, you're saying that this explicitness / verboseness:

.with_transitions(
      ("process_user_input", "decide_next_action"),
      ("decide_next_action", "generate_text"),
      ("decide_next_action", "generate_image", expr("mode==generate_image")),
      ("decide_next_action", "ask_for_details", expr("mode==ask_user_for_details")),
  )

is slowing you down while iterating?

It's obviously a trade-off, to me the following requires me to do more work to understand how things work:

.with_transitions(
   ("process_user_input", "decide_next_action"),
   (
     "decide_next_action",
     ["generate_text", "generate_image", "ask_user_for_details"],
     LLMDecider()
   ),
 )

Otherwise the impacts on the UI, and debugging need to be considered. Since effectively you're pushing state computation to an edge...


IIUC though, it sounds like the main pain is having to update the edge -- so we could enable a new expression/construct instead, e.g.

.with_transitions(
      ("process_user_input", "decide_next_action"),
      ("decide_next_action", 
          ["generate_text", "generate_image", "ask_for_details"], 
              switch("mode=={{action.name}}"),
        ),
  )

@elijahbenizzy
Copy link
Contributor

So I think the switch statement is nice as a concept, but I'd decouple it from the LLM stuff. A few things in the example above:

  1. burying the heavy-lifting in the edge (generally bad practice in Burr)
  2. Ensuring readability -- switch, or the OneToMany condition -- this helps + makes it concise
  3. Dynamic edges -- hard to follow the code on LLMDecider -- that seems to me to add a level of indirection that's not necessarily needed

But the switch statement is the main purpose, which I like. This depends on how common this is -- I could see it useful for tool calling, but I'm not sure how generalizable this is? TBH I'm not sure the mode-switching is actually a great first example, cause ChatGPT just does that for you.

@zilto
Copy link
Collaborator Author

zilto commented Sep 9, 2024

@skrawcz I used the LLMDecider idea as an example of a common use case, but it's not the point of the feature.

The main point is

If I'm understanding correctly, you're saying that this explicitness / verboseness:
is slowing you down while iterating?

.with_transitions(
      ("process_user_input", "decide_next_action"),
      ("decide_next_action", "generate_text"),
      ("decide_next_action", "generate_image", expr("mode==generate_image")),
      ("decide_next_action", "ask_for_details", expr("mode==ask_user_for_details")),
  )
  1. For one, the number of conditions here is small. The current interface provides no guarantee that all decide_next_action -> ... are sorted and placed together. Also, the users have no easy way to do so. Whereas the following makes it very obvious where to edit code
(
  "decide_next_action", 
   ("generate_text", "generate_image", "ask_for_details"), 
   Select(...),
),

Also, we already support "many-to-one" definitions, so that doesn't seem outlandish

 (
     ("generate_text", "generate_image", "ask_user_for_details"),
     "send_response"
 ),
  1. Relying on the order of statements to evaluate conditions is brittle and not obvious to the user. If the app behaves oddly, they have to know that ordering matters. The good-faith user that wants to sort the above messy code (argument 1.) will break its own app by sorting the code. It creates a "don't touch if it aint broken" and reduces maintainability

and be inefficient when computing transitions is expensive.

  1. If a node has 1 -> n transitions, current sequential checks have to go through 1 to n checks. Meanwhile Select has to go through a single check to decide between n actions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants