This repository corresponds to the Master's thesis in Artificial Intelligence by Tarmo Pungas, at University of Amsterdam, 2024.
The repo is based on the code from the paper The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets by Samuel Marks and Max Tegmark. We thank the authors for making their code publicly available.
- Navigate to the location that you want to clone this repo to, clone and enter the repo, and install the requirements.
git clone https://github.com/tarmopungas/msc-thesis.git
cd msc-thesis
pip install -r requirements.txt
- Add any .csv datasets you would like to work with to the
datasets
folder. Seedatasets/experiment_cps.csv
for how to format the files. - If you are using locally stored language models, specify the absolute path for the directory with model weights in
config.ini
. You can also use HuggingFace repos. - Generate activations for the datasets you'd like to work with using a command like
python generate_acts.py --model llama-13b --layers 8 10 12 --datasets cities neg_cities --device cuda:0
These activations will be stored in the acts directory. If you want to save activations for all layers, simply use --layers -1
.
Note that it is also possible to use NNsight to run inference remotely. To do this, join the NDIF Discord community and request an API key. You can then use --device remote
when running any of the scripts.
This directory contains the following files:
acts
: the activations will be saved to this directorydata_processing
: StereoSet and CrowS-Pairs data, including processing scriptsdatasets
: .csv files with labeled dataexperimental_outputs
: the results will be saved to this directoryfigures
: all the figures produced in the thesisjob_files
: example job files for running the scripts on SLURMbias_patching.py
: script for running the patching experimentconfig.ini
: specify which models to use heredataexplorer.ipynb
: notebook for generating PCA visualizationsgeneralization.ipynb
: notebook for running the generalization experimentgenerate_acts.py
: script for generating model activationsinterventions
: script for running the intervention experimentpatching_nb.py
andpatching_nb.ipynb
: for creating a figure from the patching experiment resultspatching_prompts.txt
: prompts used in the thesis for all patching experimentsprobes.py
: definitions of logistic regression and mass-mean probesuncertainties.py
: script for calculating uncertainties of the normalized indirect effectutils.py
andvisualization_utils.py
: utilities for managing datasets and producing visualizations.