Langsmith Evaluation custom metrics #26639

HemanthVikash · 2024-09-19T00:39:08Z

HemanthVikash
Sep 19, 2024

I am writing an evaluation that runs for n=5 iterations in each example and I want to see what the output scores are. One such score that I am evaluating is the data_row_count. For each example, I can see the averaged data_row_count on langsmith. However, there is seemingly no way to calculate variance or standard deviation. Is there an easy way to currently do that?

Here is my eval code:

evaluate(
        lambda inputs: cb.ask(inputs['question']), 
        data=questions_dataset_name,
        evaluators=[sql_validity_evaluator_experimentation],
        num_repetitions=5,
        experiment_prefix="sql_generator_v2_openai_gpt4o",
        description="SQL generator v2 experiment evaluation",
    )

Here is my evaluator:

def sql_validity_evaluator_experimentation(root_run: Run, example: Example) -> dict:
    
    try:
        llm_output = root_run.outputs['final_answer_query']    
    except Exception as e:
        return {
            "results": [
                {"score": False, "key": "sql_executable", "comment": str(e)}
            ]
        }
    
    sql_executable = check_executable(llm_output)
    col_count = 0
    row_count = 0


    if sql_executable:

        sql_results = execute_sql(llm_output)
        row_count = len(sql_results)
        col_count = len(sql_results.columns)

    return {
        "results": [
            {"score": sql_executable, "key": "sql_executable", "comment": "Check if executable"},
            {"score": row_count, "key": "data_row_count", "comment": ""},
            {"score": col_count, "key": "data_column_count", "comment": ""}
        ]
    }

Additionally here is a screenshot of the langsmith eval:

In this screenshot I essentially want to have different metrics (variance or SD) across the runs for each value captured (data_row_count, data_column_count, sql_executable)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langsmith Evaluation custom metrics #26639

{{title}}

Replies: 0 comments

Select a reply

Langsmith Evaluation custom metrics #26639

HemanthVikash Sep 19, 2024

Replies: 0 comments

HemanthVikash
Sep 19, 2024