Skip to content
Change the repository type filter

All

    Repositories list

    • Code to reproduce the paper "Predictors from causal features do not generalize better to new domains"
      Python
      Other
      7500Updated Oct 23, 2024Oct 23, 2024
    • folktexts

      Public
      Get classification risk scores on tabular tasks using LLMs
      Jupyter Notebook
      MIT License
      0900Updated Oct 4, 2024Oct 4, 2024
    • Code to reproduce the paper "Questioning the Survey Responses of Large Language Models"
      Jupyter Notebook
      MIT License
      1700Updated Sep 23, 2024Sep 23, 2024
    • A framework for few-shot evaluation of language models.
      Python
      MIT License
      1.8k100Updated Sep 20, 2024Sep 20, 2024
    • lawma

      Public
      Lawma: A lightly fine-tuned Llama model for legal classification tasks.
      Jupyter Notebook
      0900Updated Sep 14, 2024Sep 14, 2024
    • BenchBench is a Python package to evaluate multi-task benchmarks.
      Python
      MIT License
      11100Updated Jul 18, 2024Jul 18, 2024
    • Code to reproduce the experiments in the paper Training on the Test Task Confounds Evaluation and Emergence.
      Jupyter Notebook
      0600Updated Jul 14, 2024Jul 14, 2024
    • Datasets derived from US census data
      Python
      MIT License
      2023653Updated May 15, 2024May 15, 2024
    • Achieve error-rate fairness between societal groups for any score-based classifier.
      Python
      MIT License
      51601Updated Apr 26, 2024Apr 26, 2024
    • tttlm

      Public
      Test-time-training on nearest neighbors for large language models
      Python
      MIT License
      42400Updated Apr 18, 2024Apr 18, 2024
    • Code for "Is your model predicting the past?"
      Jupyter Notebook
      MIT License
      0100Updated Mar 10, 2024Mar 10, 2024
    • whynot

      Public
      A Python sandbox for decision making in dynamics
      Python
      MIT License
      4541782Updated Aug 21, 2023Aug 21, 2023