Readme

Dependencies

When you are installing this for the first time, run:

conda env create --file environment.yml

After that, and on every new session, run:

conda activate mscan2

During development, add dependencies that you care about to environment.yml. Then run:

conda env update --file environment.yml --prune

Usage

Create split files

The SCAN repo gives us the pre-split data, while the MCD splits give a JSON description of how to split. The following script converts the SCAN splits to the same format as MCD.

./scripts/create_split_files.sh

Preprocess

The following translates the English to various languages, then uses the split JSON descriptions to produces datasets for each (language, split) pair.

./scripts/preprocess_all_scan.sh

Inference on HuggingFace Inference API

Set an environment variable with your Hugging Face token:

export HF_TOKEN="hf_..."

Then, to run an experiment, you can run:

python src/inference.py --model-name "bigscience/bloom" --train data/output/en/simple/train.txt --test data/output/en/simple/test.txt --output data/output/results/playground/results.json --context-size 2 --num-queries 1

Inference on OpenAI inference API

Set an environment variable with your OpenAI token:

export OPENAI_API_KEY="sk-..."

Condor

Edit condor-run.sh with whatever you want to do in the Condor task, then run:

condor_submit condor-task.cmd

To see the queue, and see whether your job is running:

condor_q

SLURM

Start by generating all SLURM submission scripts

python src/generate_slurm.py

Run a single job

To run a single SLURM job, do e.g.:

sbatch scripts/generated/slurm/bloomz/run-bloomz-en-mcd1.slurm

Then check job status with:

squeue --me

# or, for more information:
squeue --me -o "%22S %.12i %.45j %.10T %.10M %.30R" --sort="M,j"

And check job output with e.g.:

# Replace job ID and task ID below
cat /mmfs1/gscratch/clmbr/amelie/projects/thesis_multiling_compos/data/output/results/bloomz/en/mcd1/<task_id>_<job_id>.out

Run all jobs

./scripts/generated/slurm/bloomz/slurm-submit-all.sh

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
configs		configs
data		data
paper		paper
scratch		scratch
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
condor-run.sh		condor-run.sh
condor-task.cmd		condor-task.cmd
environment.yml		environment.yml
hello.txt		hello.txt
spec-file.txt		spec-file.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Readme

Dependencies

Usage

Create split files

Preprocess

Inference on HuggingFace Inference API

Inference on OpenAI inference API

Condor

SLURM

Run a single job

Run all jobs

About

Releases

Packages

Languages

ameliereymond/thesis_multiling_compos

Folders and files

Latest commit

History

Repository files navigation

Readme

Dependencies

Usage

Create split files

Preprocess

Inference on HuggingFace Inference API

Inference on OpenAI inference API

Condor

SLURM

Run a single job

Run all jobs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages