⚡️ Speed up `resolve_pairwise_criteria()` by 21% in `libs/langchain/langchain/evaluation/comparison/eval_chain.py` #34

codeflash-ai · 2024-02-16T18:17:49Z

📄 `resolve_pairwise_criteria()` in `libs/langchain/langchain/evaluation/comparison/eval_chain.py`

📈 Performance went up by 21% (0.21x faster)

⏱️ Runtime went down from 66.70μs to 55.00μs

Explanation and details

(click to show)

To optimize your program for increased runtime performance, I would suggest avoiding repetitive calls to isinstance() and reducing the function call resolve_pairwise_criteria(criterion) inside the list comprehension.

Here's the optimized version of the code.

In the optimized version, we store the type of criteria in a variable to avoid repeated calls which helps run the code faster. In the case where criteria is a list or tuple, instead of doing a double for loop directly in a dictionary comprehension, I have broken it down into a for loop which updates a dictionary. This can speed up the code because dictionary comprehensions are recomputed and rebuilt at runtime, whereas with dict.update(), the dictionary is updated in-place which makes it more efficient. Also, the resolution of nested criteria in case of a list is now handled by direct dict updates in place of nested dictionary comprehension, further improving speed and readability.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

✅ 18 Passed − ⚙️ Existing Unit Tests

✅ 0 Passed − 🎨 Inspired Regression Tests

✅ 4 Passed − 🌀 Generated Regression Tests

(click to show generated tests)

# imports
import pytest  # used for our unit tests
from typing import Optional, Union, List
from enum import Enum

# Assuming the existence of these types and variables for the test cases
class Criteria(Enum):
    HELPFULNESS = 'helpfulness'
    RELEVANCE = 'relevance'
    CORRECTNESS = 'correctness'
    DEPTH = 'depth'

class ConstitutionalPrinciple:
    def __init__(self, name, critique_request):
        self.name = name
        self.critique_request = critique_request

# Mocked _SUPPORTED_CRITERIA for testing purposes
_SUPPORTED_CRITERIA = {
    Criteria.HELPFULNESS: 'Helps a lot',
    Criteria.RELEVANCE: 'Very relevant',
    Criteria.CORRECTNESS: 'Factually correct',
    Criteria.DEPTH: 'In-depth analysis',
}

# The function to test is inserted here as provided

# Unit tests

# Test when criteria is None
def test_resolve_pairwise_criteria_none():
    expected = {
        Criteria.HELPFULNESS.value: _SUPPORTED_CRITERIA[Criteria.HELPFULNESS],
        Criteria.RELEVANCE.value: _SUPPORTED_CRITERIA[Criteria.RELEVANCE],
        Criteria.CORRECTNESS.value: _SUPPORTED_CRITERIA[Criteria.CORRECTNESS],
        Criteria.DEPTH.value: _SUPPORTED_CRITERIA[Criteria.DEPTH],
    }
    assert resolve_pairwise_criteria(None) == expected

# Test when criteria is a single instance of Criteria
def test_resolve_pairwise_criteria_single_instance():
    for criterion in Criteria:
        expected = {criterion.value: _SUPPORTED_CRITERIA[criterion]}
        assert resolve_pairwise_criteria(criterion) == expected

# Test when criteria is a string that is a valid key in _SUPPORTED_CRITERIA
def test_resolve_pairwise_criteria_valid_string():
    for criterion in Criteria:
        expected = {criterion.value: _SUPPORTED_CRITERIA[criterion]}
        assert resolve_pairwise_criteria(criterion.value) == expected

# Test when criteria is a string that is not a key in _SUPPORTED_CRITERIA
def test_resolve_pairwise_criteria_invalid_string():
    assert resolve_pairwise_criteria("invalid") == {"invalid": ""}

# Test when criteria is an instance of ConstitutionalPrinciple
def test_resolve_pairwise_criteria_constitutional_principle():
    principle = ConstitutionalPrinciple("Equality", "Must treat everyone equally")
    expected = {"Equality": "Must treat everyone equally"}
    assert resolve_pairwise_criteria(principle) == expected

# Test when criteria is a list of valid Criteria instances
def test_resolve_pairwise_criteria_list_of_criteria():
    criteria_list = [Criteria.HELPFULNESS, Criteria.RELEVANCE]
    expected = {
        Criteria.HELPFULNESS.value: _SUPPORTED_CRITERIA[Criteria.HELPFULNESS],
        Criteria.RELEVANCE.value: _SUPPORTED_CRITERIA[Criteria.RELEVANCE],
    }
    assert resolve_pairwise_criteria(criteria_list) == expected

# Test when criteria is a list with a mix of valid Criteria instances and strings
def test_resolve_pairwise_criteria_mixed_list():
    criteria_list = [Criteria.HELPFULNESS, "DEPTH"]
    expected = {
        Criteria.HELPFULNESS.value: _SUPPORTED_CRITERIA[Criteria.HELPFULNESS],
        "DEPTH": "",
    }
    assert resolve_pairwise_criteria(criteria_list) == expected

# Test when criteria is an empty list
def test_resolve_pairwise_criteria_empty_list():
    with pytest.raises(ValueError):
        resolve_pairwise_criteria([])

# Test when criteria is an empty string
def test_resolve_pairwise_criteria_empty_string():
    with pytest.raises(ValueError):
        resolve_pairwise_criteria("")

# Test when criteria is an empty dictionary
def test_resolve_pairwise_criteria_empty_dict():
    assert resolve_pairwise_criteria({}) == {}

# Test when criteria is a dictionary with valid keys and values
def test_resolve_pairwise_criteria_valid_dict():
    criteria_dict = {Criteria.HELPFULNESS.value: "Helps a lot"}
    assert resolve_pairwise_criteria(criteria_dict) == criteria_dict

# Test when criteria is of an unexpected type (e.g., integer)
def test_resolve_pairwise_criteria_unexpected_type():
    with pytest.raises(TypeError):
        resolve_pairwise_criteria(42)  # Assuming the function raises TypeError for non-iterable non-Criteria

aphexcx

Correct: yes
Performant: yes? seems to only have been microbenchmarked. would have wanted to see tests with large number of values for more rigorous benchmarking
Readable: yes

⚡️ Speed up resolve_pairwise_criteria by 21%

e2b88bc

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by CodeFlash AI label Feb 16, 2024

codeflash-ai bot requested a review from aphexcx February 16, 2024 18:17

aphexcx approved these changes Feb 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up `resolve_pairwise_criteria()` by 21% in `libs/langchain/langchain/evaluation/comparison/eval_chain.py` #34

⚡️ Speed up `resolve_pairwise_criteria()` by 21% in `libs/langchain/langchain/evaluation/comparison/eval_chain.py` #34

codeflash-ai bot commented Feb 16, 2024

aphexcx left a comment •

edited

Loading

⚡️ Speed up resolve_pairwise_criteria() by 21% in libs/langchain/langchain/evaluation/comparison/eval_chain.py #34

Are you sure you want to change the base?

⚡️ Speed up resolve_pairwise_criteria() by 21% in libs/langchain/langchain/evaluation/comparison/eval_chain.py #34

Conversation

codeflash-ai bot commented Feb 16, 2024

📄 resolve_pairwise_criteria() in libs/langchain/langchain/evaluation/comparison/eval_chain.py

Explanation and details

Correctness verification

✅ 18 Passed − ⚙️ Existing Unit Tests

✅ 0 Passed − 🎨 Inspired Regression Tests

✅ 4 Passed − 🌀 Generated Regression Tests

aphexcx left a comment • edited Loading

Choose a reason for hiding this comment

⚡️ Speed up `resolve_pairwise_criteria()` by 21% in `libs/langchain/langchain/evaluation/comparison/eval_chain.py` #34

⚡️ Speed up `resolve_pairwise_criteria()` by 21% in `libs/langchain/langchain/evaluation/comparison/eval_chain.py` #34

📄 `resolve_pairwise_criteria()` in `libs/langchain/langchain/evaluation/comparison/eval_chain.py`

aphexcx left a comment •

edited

Loading