Skip to content

ameliereymond/thesis_multiling_compos

Repository files navigation

Readme

Dependencies

When you are installing this for the first time, run:

conda env create --file environment.yml

After that, and on every new session, run:

conda activate mscan2

During development, add dependencies that you care about to environment.yml. Then run:

conda env update --file environment.yml --prune

Usage

Create split files

The SCAN repo gives us the pre-split data, while the MCD splits give a JSON description of how to split. The following script converts the SCAN splits to the same format as MCD.

./scripts/create_split_files.sh

Preprocess

The following translates the English to various languages, then uses the split JSON descriptions to produces datasets for each (language, split) pair.

./scripts/preprocess_all_scan.sh

Inference on HuggingFace Inference API

Set an environment variable with your Hugging Face token:

export HF_TOKEN="hf_..."

Then, to run an experiment, you can run:

python src/inference.py --model-name "bigscience/bloom" --train data/output/en/simple/train.txt --test data/output/en/simple/test.txt --output data/output/results/playground/results.json --context-size 2 --num-queries 1

Inference on OpenAI inference API

Set an environment variable with your OpenAI token:

export OPENAI_API_KEY="sk-..."

Condor

Edit condor-run.sh with whatever you want to do in the Condor task, then run:

condor_submit condor-task.cmd

To see the queue, and see whether your job is running:

condor_q 

SLURM

Start by generating all SLURM submission scripts

python src/generate_slurm.py

Run a single job

To run a single SLURM job, do e.g.:

sbatch scripts/generated/slurm/bloomz/run-bloomz-en-mcd1.slurm

Then check job status with:

squeue --me

# or, for more information:
squeue --me -o "%22S %.12i %.45j %.10T %.10M %.30R" --sort="M,j"

And check job output with e.g.:

# Replace job ID and task ID below
cat /mmfs1/gscratch/clmbr/amelie/projects/thesis_multiling_compos/data/output/results/bloomz/en/mcd1/<task_id>_<job_id>.out

Run all jobs

./scripts/generated/slurm/bloomz/slurm-submit-all.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published