Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching docsgpt #1308

Merged
merged 17 commits into from
Oct 15, 2024
Merged

Caching docsgpt #1308

merged 17 commits into from
Oct 15, 2024

Conversation

fadingNA
Copy link
Contributor

What kind of change does this PR introduce? (New Feature Caching)

The changes are applied in the BaseLLM class to ensure that all LLM queries (both standard and streaming) benefit from

  • Caching of responses to improve performance.
  • Token usage tracking for monitoring API costs.
  • The concrete LLMs implementations now automatically apply caching and token tracking without modifying their core logic.

Why was this change needed? (You can also link to an open issue here)

Other information

  • The addition of caching and token usage tracking was necessary to improve performance and reduce redundant API calls for LLM queries. This change also allows monitoring of token usage for better cost management. By caching the results of similar requests, repeated queries can retrieve cached responses, thus saving time and reducing API costs.

Additionally, the use of decorators makes the code more modular, allowing the caching and token tracking logic to be applied across different LLM implementations without modifying each one.

Copy link

vercel bot commented Oct 12, 2024

@fadingNA is attempting to deploy a commit to the Arc53 Team on Vercel.

A member of the Team first needs to authorize it.

@fadingNA
Copy link
Contributor Author

@dartpain Hi Alex as we discussed on Discord this is possible feature please review and let me know!

Copy link

vercel bot commented Oct 14, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs-gpt ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 15, 2024 10:54am

@fadingNA
Copy link
Contributor Author

@dartpain Hi Alex I have made change to cache the whole conversation if cache hit as you suggestion.

  • adapt test case on test_anthropic.py
   def test_gen(self):
        messages = [
            {"content": "context"},
            {"content": "question"}
        ]
        mock_response = Mock()
        mock_response.completion = "test completion"

        with patch("application.cache.make_redis") as mock_make_redis:
            mock_redis_instance = mock_make_redis.return_value
            mock_redis_instance.get.return_value = None
            mock_redis_instance.set = Mock()

            with patch.object(self.llm.anthropic.completions, "create", return_value=mock_response) as mock_create:
                response = self.llm.gen("test_model", messages)
                self.assertEqual(response, "test completion")

                prompt_expected = "### Context \n context \n ### Question \n question"
                mock_create.assert_called_with(
                    model="test_model",
                    max_tokens_to_sample=300,
                    stream=False,
                    prompt=f"{self.llm.HUMAN_PROMPT} {prompt_expected}{self.llm.AI_PROMPT}"
                )
            mock_redis_instance.set.assert_called_once()

def test_gen_stream(self):
        messages = [
            {"content": "context"},
            {"content": "question"}
        ]
        mock_responses = [Mock(completion="response_1"), Mock(completion="response_2")]

        with patch("application.cache.make_redis") as mock_make_redis:
            mock_redis_instance = mock_make_redis.return_value
            mock_redis_instance.get.return_value = None
            mock_redis_instance.set = Mock()
  • adapt test case on sage_maker
def test_gen(self):
        with patch('application.cache.make_redis') as mock_make_redis:
            mock_redis_instance = mock_make_redis.return_value
            mock_redis_instance.get.return_value = None

            with patch.object(self.sagemaker.runtime, 'invoke_endpoint', 
                            return_value=self.response) as mock_invoke_endpoint:
                output = self.sagemaker.gen(None, self.messages)
                mock_invoke_endpoint.assert_called_once_with(
                    EndpointName=self.sagemaker.endpoint,
                    ContentType='application/json',
                    Body=self.body_bytes
                )
                self.assertEqual(output, 
                                self.result[0]['generated_text'][len(self.prompt):])
            mock_make_redis.assert_called_once()
            mock_redis_instance.set.assert_called_once()
    
    def test_gen_stream(self):
        with patch('application.cache.make_redis') as mock_make_redis:
            mock_redis_instance = mock_make_redis.return_value
            mock_redis_instance.get.return_value = None

            with patch.object(self.sagemaker.runtime, 'invoke_endpoint_with_response_stream', 
                            return_value=self.response) as mock_invoke_endpoint:
                output = list(self.sagemaker.gen_stream(None, self.messages))
                mock_invoke_endpoint.assert_called_once_with(
                    EndpointName=self.sagemaker.endpoint,
                    ContentType='application/json',
                    Body=self.body_bytes_stream
                )
                self.assertEqual(output, [])
            mock_redis_instance.set.assert_called_once()
  • make cache by whole conver list cache_key = gen_cache_key(*messages)

@dartpain
Copy link
Contributor

Just pushed some minor changes with singleton + docker handling + error handling

I also removed some docstrings there.

@dartpain dartpain merged commit bcd9005 into arc53:main Oct 15, 2024
6 checks passed
@fadingNA fadingNA deleted the caching-docsgpt branch October 15, 2024 20:35
@dartpain
Copy link
Contributor

@holopin-bot @fadingNA Thank you!

Copy link

holopin-bot bot commented Oct 21, 2024

Congratulations @fadingNA, the maintainer of this repository has issued you a badge! Here it is: https://holopin.io/claim/cm2jeocs406610clau1sl84cp

This badge can only be claimed by you, so make sure that your GitHub account is linked to your Holopin account. You can manage those preferences here: https://holopin.io/account.
Or if you're new to Holopin, you can simply sign up with GitHub, which will do the trick!

@fadingNA
Copy link
Contributor Author

@dartpain youre welcome Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants