LLM reporting Middleware that tracks token counts for AI APIs #6338

lonelycode · 2024-06-11T04:51:09Z

User description

Adds a new middleware that uses tiktoken to count AI message tokens (support OpenAI and Anthropic)

Description

Adds mw_llm_reporter.go middleware
Hooks after transforms and auth
The middleware is only loaded if the API is tagged with llm
LLMReporter middleware will
- decode outbound body to a base API message format used by both Anthropic and OpenAI (Gemini uses a different content structure)
- attempt to build a content blob from the message payload
- attempt to detect the model being used in the request
- use the tiktoken library to estimate the number of tokens, if it can't get a clear read on the model being used it will tag the count as estiamted
- Set three values into the context: ctx.LLMReport_Model, ctx.LLMReport_NumTokens, and ctx.LLMReport_Estimate.
These context values can later be used in the RecordHit function to store the data in an an alytics record (as this is part of Tyk Pump I did not extend the pump record)

Motivation and Context

We've had multiple requests for this kind of reporting and it is relatively easy to add to our analytics, it is also something our competitors already do and will allow us to have a matching feature.

How This Has Been Tested

Manual testing directly with dummy requests from OpenAI and Anthropic docs

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Refactoring or add test (improvements in base code or adds test coverage to functionality)

Checklist

I ensured that the documentation is up to date
I explained why this PR updates go.mod in detail with reasoning why it's required
I would like a code coverage CI quality gate exception and have explained why

PR Type

Enhancement, Dependencies

Description

Added new middleware LLMReport to track token counts for AI APIs using the tiktoken library.
Introduced constants and functions to handle LLM report data in the request context.
Updated API processing to include the new middleware.
Modified quickstart API configuration to support LLM reporting.
Added necessary dependencies for the new middleware.

Changes walkthrough 📝

Relevant files

Enhancement

ctx.go `Add constants for LLM report context values.` ctx/ctx.go Added constants for LLM report model, token count, and estimate.	+3/-0
api.go `Add functions to retrieve LLM report data from context.` gateway/api.go Added functions to get LLM report token count, model name, and estimate from request context.	+21/-0
api_loader.go `Enable LLMReport middleware in API processing.` gateway/api_loader.go Enabled LLMReport middleware in the API processing chain.	+1/-0
mw_llm_reporter.go `Implement LLMReport middleware for token counting.` gateway/mw_llm_reporter.go Added new middleware to decode request body, detect model, count tokens, and set context values. Implemented functions to decode body, detect model, and count tokens using tiktoken library.	+132/-0
quickstart.json `Update quickstart API configuration for LLM reporting.` apps/quickstart.json Added `llm` tag to the quickstart API configuration. Enabled tracking by setting `do_not_track` to false.	+2/-1

Dependencies

go.mod `Add dependencies for LLM reporting middleware.` go.mod Added `tiktoken-go` and `regexp2` dependencies.	+2/-0
go.sum `Update go.sum with new dependencies.` go.sum Updated to include checksums for new dependencies.	+4/-0

💡 PR-Agent usage:
Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

… estimates others)

github-actions · 2024-06-11T04:51:53Z

API Changes

--- prev.txt	2024-06-11 05:38:00.319166295 +0000
+++ current.txt	2024-06-11 05:37:57.343136072 +0000
@@ -6859,6 +6859,9 @@
 	GraphQLRequest
 	GraphQLIsWebSocketUpgrade
 	OASOperation
+	LLMReport_Model
+	LLMReport_NumRequestTokens
+	LLMReport_Estimate
 
 	// CacheOptions holds cache options required for cache writer middleware.
 	CacheOptions
@@ -8790,6 +8793,16 @@
 
 func (l *LDAPStorageHandler) SetRollingWindow(keyName string, per int64, val string, pipeline bool) (int, []interface{})
 
+type LLMReport struct {
+	*BaseMiddleware
+}
+
+func (sa *LLMReport) EnabledForSpec() bool
+
+func (sa *LLMReport) Name() string
+
+func (sa *LLMReport) ProcessRequest(w http.ResponseWriter, r *http.Request, _ interface{}) (error, int)
+
 type LogMessageEventHandler struct {
 	Gw *Gateway `json:"-"`
 	// Has unexported fields.

github-actions · 2024-06-11T04:51:57Z

PR Reviewer Guide 🔍

⏱️ Estimated effort to review [1-5]	4
🧪 Relevant tests	No
🔒 Security concerns	No
⚡ Key issues to review	Possible Bug: The middleware name in `LLMReport` is incorrectly set to "StripAuth". This should be corrected to reflect the actual middleware purpose, such as "LLMReport".
	Error Handling: The `decodeBody` and `countTokens` methods could potentially return errors that are not handled in a way that differentiates between different types of errors (e.g., network vs. parsing errors). More granular error handling might be beneficial.
	Performance Concern: The `countTokens` method concatenates strings in a loop, which can be inefficient for large numbers of messages. Consider using a `strings.Builder` for more efficient string concatenation.

github-actions · 2024-06-11T04:52:30Z

PR Code Suggestions ✨

Category	Suggestion	Score
Possible bug	Implement safe type assertions with error handling to prevent runtime panics Add error handling for type assertions when extracting values from the context to prevent runtime panics if the types do not match expected. gateway/api.go [3109-3124] -return v.(int) -return v.(string) -return v.(bool) +if numTokens, ok := v.(int); ok { + return numTokens +} +return 0 +if modelName, ok := v.(string); ok { + return modelName +} +return "" + +if isEstimate, ok := v.(bool); ok { + return isEstimate +} +return false + Suggestion importance[1-10]: 10 Why: This suggestion addresses a critical issue by adding error handling for type assertions, which prevents potential runtime panics. This is crucial for the robustness and stability of the application.	10
Possible bug	Add handling for missing `Content-Type` header to prevent potential nil pointer dereference Consider handling the case where the `Content-Type` header is not set at all, which could lead to a nil pointer dereference when calling `r.Header.Get("Content-Type")`. gateway/mw_llm_reporter.go [48-49] -if r.Header.Get("Content-Type") != "application/json" { +contentType := r.Header.Get("Content-Type") +if contentType == "" \|\| contentType != "application/json" { return nil, fmt.Errorf("Content-Type is not application/json") } Suggestion importance[1-10]: 9 Why: This suggestion addresses a potential bug that could lead to a nil pointer dereference, which is a significant issue. The improved code ensures that the `Content-Type` header is checked for both presence and correctness.	9
Enhancement	Change the middleware name to accurately reflect its purpose Use a more descriptive middleware name instead of "StripAuth" which seems unrelated to the actual functionality of LLM reporting. gateway/mw_llm_reporter.go [30] -return "StripAuth" +return "LLMReport" Suggestion importance[1-10]: 8 Why: Changing the middleware name to accurately reflect its purpose enhances code readability and maintainability. This is an important improvement, although not as critical as fixing bugs.	8
Maintainability	Use a constant for the default model name to improve maintainability Replace the hardcoded model name "gpt-3.5-turbo" with a constant to avoid magic strings and facilitate easier updates or configurations. gateway/mw_llm_reporter.go [72] -model := "gpt-3.5-turbo" +const defaultModel = "gpt-3.5-turbo" +model := defaultModel Suggestion importance[1-10]: 7 Why: Using a constant for the model name improves maintainability and readability of the code by avoiding magic strings. However, this is a minor improvement compared to fixing potential bugs.	7

github-actions · 2024-06-11T04:53:14Z

💥 CI tests failed 🙈

git-state

diff --git a/gateway/mw_llm_reporter.go b/gateway/mw_llm_reporter.go
index 305ee78..bff0c82 100644
--- a/gateway/mw_llm_reporter.go
+++ b/gateway/mw_llm_reporter.go
@@ -8,8 +8,9 @@ import (
 	"os"
 	"strings"
 
-	"github.com/TykTechnologies/tyk/ctx"
 	"github.com/pkoukk/tiktoken-go"
+
+	"github.com/TykTechnologies/tyk/ctx"
 )
 
 type msgObject struct {

Please look at the run or in the Checks tab.

github-actions · 2024-06-11T05:39:37Z

💥 CI tests failed 🙈

git-state

diff --git a/gateway/mw_llm_reporter.go b/gateway/mw_llm_reporter.go
index 8336ca2..413f9ce 100644
--- a/gateway/mw_llm_reporter.go
+++ b/gateway/mw_llm_reporter.go
@@ -8,8 +8,9 @@ import (
 	"os"
 	"strings"
 
-	"github.com/TykTechnologies/tyk/ctx"
 	"github.com/pkoukk/tiktoken-go"
+
+	"github.com/TykTechnologies/tyk/ctx"
 )
 
 type msgObject struct {

Please look at the run or in the Checks tab.

github-actions · 2024-06-11T05:39:42Z

💥 CI tests failed 🙈

git-state

diff --git a/gateway/mw_llm_reporter.go b/gateway/mw_llm_reporter.go
index 8336ca2..413f9ce 100644
--- a/gateway/mw_llm_reporter.go
+++ b/gateway/mw_llm_reporter.go
@@ -8,8 +8,9 @@ import (
 	"os"
 	"strings"
 
-	"github.com/TykTechnologies/tyk/ctx"
 	"github.com/pkoukk/tiktoken-go"
+
+	"github.com/TykTechnologies/tyk/ctx"
 )
 
 type msgObject struct {

Please look at the run or in the Checks tab.

buger · 2024-06-13T08:06:48Z

@lonelycode all LLMs this days return amount of spent tokens for request and response, so maybe instead of calculating it before request, calculate it after response? And after update the rates?

LLM reporting Middleware that tracks token counts for GPT models (and…

06ecbbe

… estimates others)

github-actions bot added enhancement dependencies Pull requests that update a dependency file labels Jun 11, 2024

github-actions bot added the Review effort [1-5]: 4 label Jun 11, 2024

changed naming of context variable and fixed body clone

284c51f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM reporting Middleware that tracks token counts for AI APIs #6338

LLM reporting Middleware that tracks token counts for AI APIs #6338

lonelycode commented Jun 11, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Jun 11, 2024 •

edited

Loading

github-actions bot commented Jun 11, 2024

github-actions bot commented Jun 11, 2024

github-actions bot commented Jun 11, 2024

github-actions bot commented Jun 11, 2024

github-actions bot commented Jun 11, 2024

buger commented Jun 13, 2024

LLM reporting Middleware that tracks token counts for AI APIs #6338

Are you sure you want to change the base?

LLM reporting Middleware that tracks token counts for AI APIs #6338

Conversation

lonelycode commented Jun 11, 2024 • edited by github-actions bot Loading

User description

Description

Motivation and Context

How This Has Been Tested

Types of changes

Checklist

PR Type

Description

Changes walkthrough 📝

github-actions bot commented Jun 11, 2024 • edited Loading

github-actions bot commented Jun 11, 2024

PR Reviewer Guide 🔍

github-actions bot commented Jun 11, 2024

PR Code Suggestions ✨

github-actions bot commented Jun 11, 2024

💥 CI tests failed 🙈

git-state

github-actions bot commented Jun 11, 2024

💥 CI tests failed 🙈

git-state

github-actions bot commented Jun 11, 2024

💥 CI tests failed 🙈

git-state

buger commented Jun 13, 2024

lonelycode commented Jun 11, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Jun 11, 2024 •

edited

Loading