Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM reporting Middleware that tracks token counts for AI APIs #6338

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lonelycode
Copy link
Member

@lonelycode lonelycode commented Jun 11, 2024

User description

Adds a new middleware that uses tiktoken to count AI message tokens (support OpenAI and Anthropic)

Description

  • Adds mw_llm_reporter.go middleware
  • Hooks after transforms and auth
  • The middleware is only loaded if the API is tagged with llm
  • LLMReporter middleware will
    • decode outbound body to a base API message format used by both Anthropic and OpenAI (Gemini uses a different content structure)
    • attempt to build a content blob from the message payload
    • attempt to detect the model being used in the request
    • use the tiktoken library to estimate the number of tokens, if it can't get a clear read on the model being used it will tag the count as estiamted
    • Set three values into the context: ctx.LLMReport_Model, ctx.LLMReport_NumTokens, and ctx.LLMReport_Estimate.
  • These context values can later be used in the RecordHit function to store the data in an an alytics record (as this is part of Tyk Pump I did not extend the pump record)

Motivation and Context

We've had multiple requests for this kind of reporting and it is relatively easy to add to our analytics, it is also something our competitors already do and will allow us to have a matching feature.

How This Has Been Tested

  • Manual testing directly with dummy requests from OpenAI and Anthropic docs

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Refactoring or add test (improvements in base code or adds test coverage to functionality)

Checklist

  • I ensured that the documentation is up to date
  • I explained why this PR updates go.mod in detail with reasoning why it's required
  • I would like a code coverage CI quality gate exception and have explained why

PR Type

Enhancement, Dependencies


Description

  • Added new middleware LLMReport to track token counts for AI APIs using the tiktoken library.
  • Introduced constants and functions to handle LLM report data in the request context.
  • Updated API processing to include the new middleware.
  • Modified quickstart API configuration to support LLM reporting.
  • Added necessary dependencies for the new middleware.

Changes walkthrough 📝

Relevant files
Enhancement
ctx.go
Add constants for LLM report context values.                         

ctx/ctx.go

  • Added constants for LLM report model, token count, and estimate.
+3/-0     
api.go
Add functions to retrieve LLM report data from context.   

gateway/api.go

  • Added functions to get LLM report token count, model name, and
    estimate from request context.
  • +21/-0   
    api_loader.go
    Enable LLMReport middleware in API processing.                     

    gateway/api_loader.go

    • Enabled LLMReport middleware in the API processing chain.
    +1/-0     
    mw_llm_reporter.go
    Implement LLMReport middleware for token counting.             

    gateway/mw_llm_reporter.go

  • Added new middleware to decode request body, detect model, count
    tokens, and set context values.
  • Implemented functions to decode body, detect model, and count tokens
    using tiktoken library.
  • +132/-0 
    quickstart.json
    Update quickstart API configuration for LLM reporting.     

    apps/quickstart.json

  • Added llm tag to the quickstart API configuration.
  • Enabled tracking by setting do_not_track to false.
  • +2/-1     
    Dependencies
    go.mod
    Add dependencies for LLM reporting middleware.                     

    go.mod

    • Added tiktoken-go and regexp2 dependencies.
    +2/-0     
    go.sum
    Update go.sum with new dependencies.                                         

    go.sum

    • Updated to include checksums for new dependencies.
    +4/-0     

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    @github-actions github-actions bot added enhancement dependencies Pull requests that update a dependency file labels Jun 11, 2024
    Copy link
    Contributor

    github-actions bot commented Jun 11, 2024

    API Changes

    --- prev.txt	2024-06-11 05:38:00.319166295 +0000
    +++ current.txt	2024-06-11 05:37:57.343136072 +0000
    @@ -6859,6 +6859,9 @@
     	GraphQLRequest
     	GraphQLIsWebSocketUpgrade
     	OASOperation
    +	LLMReport_Model
    +	LLMReport_NumRequestTokens
    +	LLMReport_Estimate
     
     	// CacheOptions holds cache options required for cache writer middleware.
     	CacheOptions
    @@ -8790,6 +8793,16 @@
     
     func (l *LDAPStorageHandler) SetRollingWindow(keyName string, per int64, val string, pipeline bool) (int, []interface{})
     
    +type LLMReport struct {
    +	*BaseMiddleware
    +}
    +
    +func (sa *LLMReport) EnabledForSpec() bool
    +
    +func (sa *LLMReport) Name() string
    +
    +func (sa *LLMReport) ProcessRequest(w http.ResponseWriter, r *http.Request, _ interface{}) (error, int)
    +
     type LogMessageEventHandler struct {
     	Gw *Gateway `json:"-"`
     	// Has unexported fields.

    Copy link
    Contributor

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review [1-5]

    4

    🧪 Relevant tests

    No

    🔒 Security concerns

    No

    ⚡ Key issues to review

    Possible Bug:
    The middleware name in LLMReport is incorrectly set to "StripAuth". This should be corrected to reflect the actual middleware purpose, such as "LLMReport".

    Error Handling:
    The decodeBody and countTokens methods could potentially return errors that are not handled in a way that differentiates between different types of errors (e.g., network vs. parsing errors). More granular error handling might be beneficial.

    Performance Concern:
    The countTokens method concatenates strings in a loop, which can be inefficient for large numbers of messages. Consider using a strings.Builder for more efficient string concatenation.

    Copy link
    Contributor

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Possible bug
    Implement safe type assertions with error handling to prevent runtime panics

    Add error handling for type assertions when extracting values from the context to prevent
    runtime panics if the types do not match expected.

    gateway/api.go [3109-3124]

    -return v.(int)
    -return v.(string)
    -return v.(bool)
    +if numTokens, ok := v.(int); ok {
    +  return numTokens
    +}
    +return 0
     
    +if modelName, ok := v.(string); ok {
    +  return modelName
    +}
    +return ""
    +
    +if isEstimate, ok := v.(bool); ok {
    +  return isEstimate
    +}
    +return false
    +
    Suggestion importance[1-10]: 10

    Why: This suggestion addresses a critical issue by adding error handling for type assertions, which prevents potential runtime panics. This is crucial for the robustness and stability of the application.

    10
    Add handling for missing Content-Type header to prevent potential nil pointer dereference

    Consider handling the case where the Content-Type header is not set at all, which could
    lead to a nil pointer dereference when calling r.Header.Get("Content-Type").

    gateway/mw_llm_reporter.go [48-49]

    -if r.Header.Get("Content-Type") != "application/json" {
    +contentType := r.Header.Get("Content-Type")
    +if contentType == "" || contentType != "application/json" {
       return nil, fmt.Errorf("Content-Type is not application/json")
     }
     
    Suggestion importance[1-10]: 9

    Why: This suggestion addresses a potential bug that could lead to a nil pointer dereference, which is a significant issue. The improved code ensures that the Content-Type header is checked for both presence and correctness.

    9
    Enhancement
    Change the middleware name to accurately reflect its purpose

    Use a more descriptive middleware name instead of "StripAuth" which seems unrelated to the
    actual functionality of LLM reporting.

    gateway/mw_llm_reporter.go [30]

    -return "StripAuth"
    +return "LLMReport"
     
    Suggestion importance[1-10]: 8

    Why: Changing the middleware name to accurately reflect its purpose enhances code readability and maintainability. This is an important improvement, although not as critical as fixing bugs.

    8
    Maintainability
    Use a constant for the default model name to improve maintainability

    Replace the hardcoded model name "gpt-3.5-turbo" with a constant to avoid magic strings
    and facilitate easier updates or configurations.

    gateway/mw_llm_reporter.go [72]

    -model := "gpt-3.5-turbo"
    +const defaultModel = "gpt-3.5-turbo"
    +model := defaultModel
     
    Suggestion importance[1-10]: 7

    Why: Using a constant for the model name improves maintainability and readability of the code by avoiding magic strings. However, this is a minor improvement compared to fixing potential bugs.

    7

    Copy link
    Contributor

    💥 CI tests failed 🙈

    git-state

    diff --git a/gateway/mw_llm_reporter.go b/gateway/mw_llm_reporter.go
    index 305ee78..bff0c82 100644
    --- a/gateway/mw_llm_reporter.go
    +++ b/gateway/mw_llm_reporter.go
    @@ -8,8 +8,9 @@ import (
     	"os"
     	"strings"
     
    -	"github.com/TykTechnologies/tyk/ctx"
     	"github.com/pkoukk/tiktoken-go"
    +
    +	"github.com/TykTechnologies/tyk/ctx"
     )
     
     type msgObject struct {

    Please look at the run or in the Checks tab.

    Copy link
    Contributor

    💥 CI tests failed 🙈

    git-state

    diff --git a/gateway/mw_llm_reporter.go b/gateway/mw_llm_reporter.go
    index 8336ca2..413f9ce 100644
    --- a/gateway/mw_llm_reporter.go
    +++ b/gateway/mw_llm_reporter.go
    @@ -8,8 +8,9 @@ import (
     	"os"
     	"strings"
     
    -	"github.com/TykTechnologies/tyk/ctx"
     	"github.com/pkoukk/tiktoken-go"
    +
    +	"github.com/TykTechnologies/tyk/ctx"
     )
     
     type msgObject struct {

    Please look at the run or in the Checks tab.

    1 similar comment
    Copy link
    Contributor

    💥 CI tests failed 🙈

    git-state

    diff --git a/gateway/mw_llm_reporter.go b/gateway/mw_llm_reporter.go
    index 8336ca2..413f9ce 100644
    --- a/gateway/mw_llm_reporter.go
    +++ b/gateway/mw_llm_reporter.go
    @@ -8,8 +8,9 @@ import (
     	"os"
     	"strings"
     
    -	"github.com/TykTechnologies/tyk/ctx"
     	"github.com/pkoukk/tiktoken-go"
    +
    +	"github.com/TykTechnologies/tyk/ctx"
     )
     
     type msgObject struct {

    Please look at the run or in the Checks tab.

    @buger
    Copy link
    Member

    buger commented Jun 13, 2024

    @lonelycode all LLMs this days return amount of spent tokens for request and response, so maybe instead of calculating it before request, calculate it after response? And after update the rates?

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    dependencies Pull requests that update a dependency file enhancement Review effort [1-5]: 4
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants