Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🗺 prompt playground #3435

Open
6 of 43 tasks
mikeldking opened this issue Jun 10, 2024 · 3 comments
Open
6 of 43 tasks

🗺 prompt playground #3435

mikeldking opened this issue Jun 10, 2024 · 3 comments
Assignees

Comments

@mikeldking
Copy link
Contributor

mikeldking commented Jun 10, 2024

As a user of Phoenix I don't want to have to go back to my IDE to iterate on a prompt. I want to be able to use the data stored in Phoenix (spans, dataets) and run them through a prompt.

Use-cases

  • Replay a template change on an LLM Span
  • Run a template change on a dataset
  • Construct an evaluation template on a single chosen production span or Dataset - Workflow is testing your Evals and be able to save as experiment
  • Synthetic data Generation - Use to generate synthetic data, add columns to current rows of data in a dataset, to help create test data

Milestone 1 - LLM Replay

The most important thing to accomplish is the ability to take a specific LLM step (e.g. a span) and to be able to re-play that execution. In the process of re-executing that step, we should record the new response so that we can compare and evaluate.

Out of Scope

  • focus on Chat Completions (things with roles) - not completions

Planning

UI

API

Server

Instrumentation

Milestone 2 - Datasets on Playgrounds

Add support to run a set of dataset examples through a prompt

Milestone 3 - Add support for Annotations

As a user, I want the ability to quickly annotate the result of an invocation

@heralight
Copy link

Hi!

Enhancement proposal

This feature should be similar to #2462 but with more depth. It would involve a simple button to replicate a query into an edit mode, allowing you to replay it. Additionally, it should offer the possibility to add notes on the result iterations, such as rating the quality, format output, etc., on a scale of 1 to 10.

Goal

The goal is to facilitate quick testing of prompts and inputs, enabling evaluation and visualization of progression.

Thank you,

Alexandre

@mikeldking
Copy link
Contributor Author

Hi!

Enhancement proposal

This feature should be similar to #2462 but with more depth. It would involve a simple button to replicate a query into an edit mode, allowing you to replay it. Additionally, it should offer the possibility to add notes on the result iterations, such as rating the quality, format output, etc., on a scale of 1 to 10.

Goal

The goal is to facilitate quick testing of prompts and inputs, enabling evaluation and visualization of progression.

Thank you,

Alexandre

@heralight Hey! Thanks for the feedback! We have a ton of features coming out with regards to prompt iteration, notably prompt experiments. Stay tuned. It has evaluations built in

Noted on the replay and the annotations:) will give it some thought. We have a few ideas around replaying data at prompts, but haven't thought about human annotations on different prom versions a ton. Would love to hear more.

@heralight
Copy link

Very nice!
my ideal workflow, would be:

  • trace some openai calls made from code
  • transform some into a replayable prompt, where each modification can be tested and versioned and annoted and ranked.
  • prompt can parametized and call from code

best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📘 Todo
Status: No status
Development

No branches or pull requests

6 participants