Caching docsgpt #1308

fadingNA · 2024-10-12T14:53:08Z

What kind of change does this PR introduce? (New Feature Caching)

The changes are applied in the BaseLLM class to ensure that all LLM queries (both standard and streaming) benefit from

Caching of responses to improve performance.
Token usage tracking for monitoring API costs.
The concrete LLMs implementations now automatically apply caching and token tracking without modifying their core logic.

Why was this change needed? (You can also link to an open issue here)

🚀 Feature: Caching for DocsGPT #1295

Other information

The addition of caching and token usage tracking was necessary to improve performance and reduce redundant API calls for LLM queries. This change also allows monitoring of token usage for better cost management. By caching the results of similar requests, repeated queries can retrieve cached responses, thus saving time and reducing API costs.

Additionally, the use of decorators makes the code more modular, allowing the caching and token tracking logic to be applied across different LLM implementations without modifying each one.

vercel · 2024-10-12T14:53:12Z

@fadingNA is attempting to deploy a commit to the Arc53 Team on Vercel.

A member of the Team first needs to authorize it.

fadingNA · 2024-10-12T22:07:01Z

@dartpain Hi Alex as we discussed on Discord this is possible feature please review and let me know!

vercel · 2024-10-14T09:37:02Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
docs-gpt	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Oct 15, 2024 10:54am

…ne msg

fadingNA · 2024-10-14T16:00:44Z

@dartpain Hi Alex I have made change to cache the whole conversation if cache hit as you suggestion.

adapt test case on test_anthropic.py

   def test_gen(self):
        messages = [
            {"content": "context"},
            {"content": "question"}
        ]
        mock_response = Mock()
        mock_response.completion = "test completion"

        with patch("application.cache.make_redis") as mock_make_redis:
            mock_redis_instance = mock_make_redis.return_value
            mock_redis_instance.get.return_value = None
            mock_redis_instance.set = Mock()

            with patch.object(self.llm.anthropic.completions, "create", return_value=mock_response) as mock_create:
                response = self.llm.gen("test_model", messages)
                self.assertEqual(response, "test completion")

                prompt_expected = "### Context \n context \n ### Question \n question"
                mock_create.assert_called_with(
                    model="test_model",
                    max_tokens_to_sample=300,
                    stream=False,
                    prompt=f"{self.llm.HUMAN_PROMPT} {prompt_expected}{self.llm.AI_PROMPT}"
                )
            mock_redis_instance.set.assert_called_once()

def test_gen_stream(self):
        messages = [
            {"content": "context"},
            {"content": "question"}
        ]
        mock_responses = [Mock(completion="response_1"), Mock(completion="response_2")]

        with patch("application.cache.make_redis") as mock_make_redis:
            mock_redis_instance = mock_make_redis.return_value
            mock_redis_instance.get.return_value = None
            mock_redis_instance.set = Mock()

adapt test case on sage_maker

def test_gen(self):
        with patch('application.cache.make_redis') as mock_make_redis:
            mock_redis_instance = mock_make_redis.return_value
            mock_redis_instance.get.return_value = None

            with patch.object(self.sagemaker.runtime, 'invoke_endpoint', 
                            return_value=self.response) as mock_invoke_endpoint:
                output = self.sagemaker.gen(None, self.messages)
                mock_invoke_endpoint.assert_called_once_with(
                    EndpointName=self.sagemaker.endpoint,
                    ContentType='application/json',
                    Body=self.body_bytes
                )
                self.assertEqual(output, 
                                self.result[0]['generated_text'][len(self.prompt):])
            mock_make_redis.assert_called_once()
            mock_redis_instance.set.assert_called_once()
    
    def test_gen_stream(self):
        with patch('application.cache.make_redis') as mock_make_redis:
            mock_redis_instance = mock_make_redis.return_value
            mock_redis_instance.get.return_value = None

            with patch.object(self.sagemaker.runtime, 'invoke_endpoint_with_response_stream', 
                            return_value=self.response) as mock_invoke_endpoint:
                output = list(self.sagemaker.gen_stream(None, self.messages))
                mock_invoke_endpoint.assert_called_once_with(
                    EndpointName=self.sagemaker.endpoint,
                    ContentType='application/json',
                    Body=self.body_bytes_stream
                )
                self.assertEqual(output, [])
            mock_redis_instance.set.assert_called_once()

make cache by whole conver list cache_key = gen_cache_key(*messages)

dartpain · 2024-10-15T10:38:25Z

Just pushed some minor changes with singleton + docker handling + error handling

I also removed some docstrings there.

dartpain · 2024-10-21T19:24:39Z

@holopin-bot @fadingNA Thank you!

holopin-bot · 2024-10-21T19:24:43Z

Congratulations @fadingNA, the maintainer of this repository has issued you a badge! Here it is: https://holopin.io/claim/cm2jeocs406610clau1sl84cp

This badge can only be claimed by you, so make sure that your GitHub account is linked to your Holopin account. You can manage those preferences here: https://holopin.io/account.
Or if you're new to Holopin, you can simply sign up with GitHub, which will do the trick!

fadingNA · 2024-10-22T00:48:21Z

@dartpain youre welcome Alex

fadingNA added 3 commits October 12, 2024 10:46

add redis configuration

b329ede

add new feature for handling decorator cache , token

1ae0af5

hashlib helper to gen cachekey

3e55be9

github-actions bot added the application Application label Oct 12, 2024

vercel bot deployed to Preview – nextra-docsgpt October 12, 2024 14:53 View deployment

cache.py for holding help function use on BASE ABC

d7fe115

vercel bot deployed to Preview – nextra-docsgpt October 12, 2024 14:56 View deployment

fadingNA added 2 commits October 12, 2024 15:25

adding time for perform streaming response

3c43b87

improve abstract for caching streaming

3c64abc

vercel bot deployed to Preview – nextra-docsgpt October 12, 2024 19:26 View deployment

fadingNA added 3 commits October 12, 2024 18:05

remove unused import from previous cache

1dfb5d2

adapt cache and add proper comment docs string

ddafb96

add possible scenario test case

a8bb992

github-actions bot added the tests Tests label Oct 12, 2024

vercel bot deployed to Preview – nextra-docsgpt October 12, 2024 22:06 View deployment

vercel bot deployed to Preview – docs-gpt October 14, 2024 09:37 View deployment

fadingNA added 3 commits October 14, 2024 11:53

change the method to save the whole conversation history instead of o…

3e32724

…ne msg

add redis mock to anthropic and sagemaker

adb2cf3

adapt test case to align with function

8ce1fd5

github-actions bot added repo frontend labels Oct 14, 2024

vercel bot deployed to Preview – nextra-docsgpt October 14, 2024 15:56 View deployment

add comment and align time from gen and stream to 30mins

f5661b3

github-actions bot removed the repo label Oct 14, 2024

vercel bot deployed to Preview – nextra-docsgpt October 14, 2024 16:10 View deployment

restore lock json

b3c0757

vercel bot deployed to Preview – nextra-docsgpt October 14, 2024 16:33 View deployment

vercel bot deployed to Preview – docs-gpt October 15, 2024 08:01 View deployment

dartpain added 2 commits October 15, 2024 11:35

fix: add singleton, logging, connection handle

204b1b1

fix: docker compose config

39e94d4

vercel bot deployed to Preview – nextra-docsgpt October 15, 2024 10:37 View deployment

github-actions bot added the repo label Oct 15, 2024

vercel bot deployed to Preview – nextra-docsgpt October 15, 2024 10:38 View deployment

vercel bot deployed to Preview – docs-gpt October 15, 2024 10:38 View deployment

fix: tests

cca6297

vercel bot deployed to Preview – nextra-docsgpt October 15, 2024 10:53 View deployment

vercel bot deployed to Preview – docs-gpt October 15, 2024 10:54 View deployment

dartpain approved these changes Oct 15, 2024

View reviewed changes

dartpain merged commit bcd9005 into arc53:main Oct 15, 2024
6 checks passed

fadingNA deleted the caching-docsgpt branch October 15, 2024 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching docsgpt #1308

Caching docsgpt #1308

fadingNA commented Oct 12, 2024

vercel bot commented Oct 12, 2024

fadingNA commented Oct 12, 2024

vercel bot commented Oct 14, 2024 •

edited

Loading

fadingNA commented Oct 14, 2024

dartpain commented Oct 15, 2024

dartpain commented Oct 21, 2024

holopin-bot bot commented Oct 21, 2024

fadingNA commented Oct 22, 2024

Caching docsgpt #1308

Caching docsgpt #1308

Conversation

fadingNA commented Oct 12, 2024

vercel bot commented Oct 12, 2024

fadingNA commented Oct 12, 2024

vercel bot commented Oct 14, 2024 • edited Loading

fadingNA commented Oct 14, 2024

dartpain commented Oct 15, 2024

dartpain commented Oct 21, 2024

holopin-bot bot commented Oct 21, 2024

fadingNA commented Oct 22, 2024

vercel bot commented Oct 14, 2024 •

edited

Loading