-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching docsgpt #1308
Caching docsgpt #1308
Conversation
@fadingNA is attempting to deploy a commit to the Arc53 Team on Vercel. A member of the Team first needs to authorize it. |
@dartpain Hi Alex as we discussed on Discord this is possible feature please review and let me know! |
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@dartpain Hi Alex I have made change to cache the whole conversation if cache hit as you suggestion.
def test_gen(self):
messages = [
{"content": "context"},
{"content": "question"}
]
mock_response = Mock()
mock_response.completion = "test completion"
with patch("application.cache.make_redis") as mock_make_redis:
mock_redis_instance = mock_make_redis.return_value
mock_redis_instance.get.return_value = None
mock_redis_instance.set = Mock()
with patch.object(self.llm.anthropic.completions, "create", return_value=mock_response) as mock_create:
response = self.llm.gen("test_model", messages)
self.assertEqual(response, "test completion")
prompt_expected = "### Context \n context \n ### Question \n question"
mock_create.assert_called_with(
model="test_model",
max_tokens_to_sample=300,
stream=False,
prompt=f"{self.llm.HUMAN_PROMPT} {prompt_expected}{self.llm.AI_PROMPT}"
)
mock_redis_instance.set.assert_called_once()
def test_gen_stream(self):
messages = [
{"content": "context"},
{"content": "question"}
]
mock_responses = [Mock(completion="response_1"), Mock(completion="response_2")]
with patch("application.cache.make_redis") as mock_make_redis:
mock_redis_instance = mock_make_redis.return_value
mock_redis_instance.get.return_value = None
mock_redis_instance.set = Mock()
def test_gen(self):
with patch('application.cache.make_redis') as mock_make_redis:
mock_redis_instance = mock_make_redis.return_value
mock_redis_instance.get.return_value = None
with patch.object(self.sagemaker.runtime, 'invoke_endpoint',
return_value=self.response) as mock_invoke_endpoint:
output = self.sagemaker.gen(None, self.messages)
mock_invoke_endpoint.assert_called_once_with(
EndpointName=self.sagemaker.endpoint,
ContentType='application/json',
Body=self.body_bytes
)
self.assertEqual(output,
self.result[0]['generated_text'][len(self.prompt):])
mock_make_redis.assert_called_once()
mock_redis_instance.set.assert_called_once()
def test_gen_stream(self):
with patch('application.cache.make_redis') as mock_make_redis:
mock_redis_instance = mock_make_redis.return_value
mock_redis_instance.get.return_value = None
with patch.object(self.sagemaker.runtime, 'invoke_endpoint_with_response_stream',
return_value=self.response) as mock_invoke_endpoint:
output = list(self.sagemaker.gen_stream(None, self.messages))
mock_invoke_endpoint.assert_called_once_with(
EndpointName=self.sagemaker.endpoint,
ContentType='application/json',
Body=self.body_bytes_stream
)
self.assertEqual(output, [])
mock_redis_instance.set.assert_called_once()
|
Just pushed some minor changes with singleton + docker handling + error handling I also removed some docstrings there. |
@holopin-bot @fadingNA Thank you! |
Congratulations @fadingNA, the maintainer of this repository has issued you a badge! Here it is: https://holopin.io/claim/cm2jeocs406610clau1sl84cp This badge can only be claimed by you, so make sure that your GitHub account is linked to your Holopin account. You can manage those preferences here: https://holopin.io/account. |
@dartpain youre welcome Alex |
What kind of change does this PR introduce? (New Feature Caching)
The changes are applied in the BaseLLM class to ensure that all LLM queries (both standard and streaming) benefit from
Why was this change needed? (You can also link to an open issue here)
Other information
Additionally, the use of decorators makes the code more modular, allowing the caching and token tracking logic to be applied across different LLM implementations without modifying each one.