-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding evals for natural language workflow building. #14417
Open
malexanderlim
wants to merge
19
commits into
master
Choose a base branch
from
component-retrieval-evals
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+6,793
−115
Open
Changes from 1 commit
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
2cd57b6
Adding evals for natural language workflow building.
malexanderlim a9b710a
Adding evaluator.mjs
dylburger ab34992
pnpm-lock
dylburger 74b4b5b
Updating incorrect evals.
malexanderlim 4f35c8c
Output eval results as a csv
bzwrk d9f5de0
Merge remote-tracking branch 'origin/master' into component-retrieval…
dylburger a95b5b6
triggers -> sources on eval JSON
dylburger 80f2256
pnpm
dylburger 58f7eb9
New Components - pdf_app_net (#14406)
michelle0927 3cc761b
renew webhooks (#14386)
michelle0927 019f1cb
OpenAI - Add audio functionality to Chat action (#14367)
michelle0927 724d098
Adding app scaffolding for jina_reader
danhsiung 8f89ee2
Runware: new action component (#14380)
jcortes d514b8d
New Components - smstools (#14378)
luancazarine d7b33bc
roamresearch: new action components (#14385)
jcortes cd3d560
New Components - everhour (#14307)
luancazarine 9a16df6
Added actions (#14426)
lcaresia 855a266
[Components] liveswitch #13859 (#14427)
lcaresia d78b009
Adding new validated list of evals.
malexanderlim File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
877 changes: 877 additions & 0 deletions
877
packages/evals/component_retrieval/eval-test-suite-0-100-filtered.json
Large diffs are not rendered by default.
Oops, something went wrong.
455 changes: 455 additions & 0 deletions
455
packages/evals/component_retrieval/eval-test-suite-101-150-verified-filtered.json
Large diffs are not rendered by default.
Oops, something went wrong.
466 changes: 466 additions & 0 deletions
466
packages/evals/component_retrieval/eval-test-suite-151-200-verified-filtered.json
Large diffs are not rendered by default.
Oops, something went wrong.
513 changes: 513 additions & 0 deletions
513
packages/evals/component_retrieval/eval-test-suite-201-250-complex-filtered.json
Large diffs are not rendered by default.
Oops, something went wrong.
461 changes: 461 additions & 0 deletions
461
packages/evals/component_retrieval/eval-test-suite-251-300-new-filtered.json
Large diffs are not rendered by default.
Oops, something went wrong.
214 changes: 214 additions & 0 deletions
214
packages/evals/component_retrieval/eval-test-suite-ai-focus-filtered.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,214 @@ | ||
{ | ||
"evaluationTests": [ | ||
{ | ||
"query": "When new support tickets come in through Zendesk, analyze sentiment with GPT and prioritize in Linear based on urgency", | ||
"triggers": [ | ||
"zendesk-new-ticket" | ||
], | ||
"actions": [ | ||
"openai-chat", | ||
"linear-create-issue" | ||
], | ||
"persona": "complex-workflow" | ||
}, | ||
{ | ||
"query": "I need sales calls recorded in Gong to be transcribed and summarized for the team in Slack", | ||
"triggers": [ | ||
"gong-new-call" | ||
], | ||
"actions": [ | ||
"openai-create-transcription", | ||
"openai-chat", | ||
"slack-send-message" | ||
], | ||
"persona": "verbose" | ||
}, | ||
{ | ||
"query": "When leads submit Typeform responses, use GPT to qualify them and update their status in HubSpot", | ||
"triggers": [ | ||
"typeform-new-submission" | ||
], | ||
"actions": [ | ||
"openai-chat", | ||
"hubspot-create-or-update-contact" | ||
], | ||
"persona": "complex-workflow" | ||
}, | ||
{ | ||
"query": "Get help from AI to classify and organize our Notion knowledge base", | ||
"triggers": [ | ||
"notion-new-page" | ||
], | ||
"actions": [ | ||
"openai-chat", | ||
"notion-create-page-from-database" | ||
], | ||
"persona": "task-oriented" | ||
}, | ||
{ | ||
"query": "When customers message us on Intercom, analyze intent with GPT before creating tickets", | ||
"triggers": [ | ||
"intercom-new-conversation" | ||
], | ||
"actions": [ | ||
"openai-chat", | ||
"linear-create-issue" | ||
], | ||
"persona": "complex-workflow" | ||
}, | ||
{ | ||
"query": "Analyze customer feedback from Delighted with AI and update account health in Salesforce", | ||
"triggers": [], | ||
"actions": [ | ||
"openai-chat", | ||
"salesforce_rest_api-update-contact" | ||
], | ||
"persona": "task-oriented" | ||
}, | ||
{ | ||
"query": "When new videos are uploaded to Zoom, I want them transcribed and summarized for the team", | ||
"triggers": [ | ||
"zoom-recording-completed" | ||
], | ||
"actions": [ | ||
"openai-create-transcription", | ||
"openai-chat", | ||
"slack-send-message" | ||
], | ||
"persona": "verbose" | ||
}, | ||
{ | ||
"query": "Use AI to analyze Github issues and suggest priority levels in Linear", | ||
"triggers": [], | ||
"actions": [ | ||
"openai-chat", | ||
"linear-create-issue" | ||
], | ||
"persona": "task-oriented" | ||
}, | ||
{ | ||
"query": "When documents are uploaded to Google Drive, use GPT to generate summaries in Notion", | ||
"triggers": [ | ||
"google_drive-new-files-instant" | ||
], | ||
"actions": [ | ||
"openai-chat", | ||
"notion-create-page-from-database" | ||
], | ||
"persona": "complex-workflow" | ||
}, | ||
{ | ||
"query": "Analyze customer churn risk based on Intercom conversations using GPT", | ||
"triggers": [ | ||
"intercom-new-conversation" | ||
], | ||
"actions": [ | ||
"openai-chat", | ||
"hubspot-create-or-update-contact" | ||
], | ||
"persona": "task-oriented" | ||
}, | ||
{ | ||
"query": "When new RSS articles mention our company, use AI to analyze sentiment and alert team", | ||
"triggers": [ | ||
"rss-new-item-in-feed" | ||
], | ||
"actions": [ | ||
"openai-chat", | ||
"slack-send-message" | ||
], | ||
"persona": "complex-workflow" | ||
}, | ||
{ | ||
"query": "I need GPT to help categorize incoming feature requests from Canny into our product roadmap", | ||
"triggers": [], | ||
"actions": [ | ||
"openai-chat", | ||
"notion-create-page-from-database" | ||
], | ||
"persona": "verbose" | ||
}, | ||
{ | ||
"query": "When customers respond to our Typeform survey, analyze trends with AI and update dashboards", | ||
"triggers": [ | ||
"typeform-new-submission" | ||
], | ||
"actions": [ | ||
"openai-chat", | ||
"google_sheets-add-single-row" | ||
], | ||
"persona": "complex-workflow" | ||
}, | ||
{ | ||
"query": "Use GPT to analyze Calendly meeting notes and create action items in Asana", | ||
"triggers": [ | ||
"calendly_v2-new-event-scheduled" | ||
], | ||
"actions": [ | ||
"openai-chat", | ||
"asana-create-task" | ||
], | ||
"persona": "task-oriented" | ||
}, | ||
{ | ||
"query": "When support team sends emails in Gmail, let AI check tone and suggest improvements", | ||
"triggers": [ | ||
"gmail-new-email" | ||
], | ||
"actions": [ | ||
"openai-chat" | ||
], | ||
"persona": "complex-workflow" | ||
}, | ||
{ | ||
"query": "I need customer conversations from Help Scout to be analyzed for product feedback", | ||
"triggers": [], | ||
"actions": [ | ||
"openai-chat", | ||
"notion-create-page-from-database" | ||
], | ||
"persona": "verbose" | ||
}, | ||
{ | ||
"query": "When new comments appear on our YouTube videos, use AI to moderate and flag issues", | ||
"triggers": [ | ||
"youtube_data_api-new-comment-posted" | ||
], | ||
"actions": [ | ||
"openai-chat", | ||
"slack-send-message" | ||
], | ||
"persona": "complex-workflow" | ||
}, | ||
{ | ||
"query": "Use GPT to analyze Twitter mentions and create support tickets when needed", | ||
"triggers": [ | ||
"twitter-new-mention-received-by-user" | ||
], | ||
"actions": [ | ||
"openai-chat", | ||
"linear-create-issue" | ||
], | ||
"persona": "task-oriented" | ||
}, | ||
{ | ||
"query": "When deals close in Salesforce, use AI to generate personalized onboarding docs", | ||
"triggers": [], | ||
"actions": [ | ||
"openai-chat", | ||
"google_docs-create-document" | ||
], | ||
"persona": "complex-workflow" | ||
}, | ||
{ | ||
"query": "I want GPT to help write better commit messages for our Github repos", | ||
"triggers": [ | ||
"github-new-commit" | ||
], | ||
"actions": [ | ||
"openai-chat" | ||
], | ||
"persona": "verbose" | ||
} | ||
] | ||
} | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing triggers for event-driven workflows.
Several test cases have empty trigger arrays despite describing event-driven scenarios:
Consider adding appropriate triggers:
Would you like me to suggest specific triggers for each case?
Also applies to: 81-88, 123-130, 164-171, 195-202