ShadesofBias

This repository provides scripts and code use in the Shades of Bias in Text Dataset. It includes code for processing the data, and for evaluation to measure bias in Language Models across languages.

Data Processing

process_dataset/map_dataset.py takes https://huggingface.co/datasets/LanguageShades/BiasShadesRaw and normalizes/formats to produce https://huggingface.co/datasets/LanguageShades/BiasShadesRaw

process_dataset/extract_vocabulary.py takes https://huggingface.co/datasets/LanguageShades/BiasShadesRaw and aligns each statement to its corresponding template slots, printing out results -- and how well the alignment worked -- in https://huggingface.co/datasets/LanguageShades/LanguageCorrections

Evaluation

HF Endpoints

To use HF Endpoint navigate to Shades if you have access. If not copy the .env file in your root directory.

Example Script

Run example_logprob_evaluate.py to iterate through the dataset for a given model and compute log probability of biased sentences. If you have the .env, load_endpoint_url(model_name) will load the model if it has been created for that model.

Run generation_evaluate.py to iterate through the dataset, with each instance formatted with a specified prompt from prompts/. It is possible to specify a prompt language that is different from the original language. Prompt language will be set to Enlish unless further specified. If you have the .env, load_endpoint_url(model_name) will load the model if it has been created for that model.

Add more prompts

Follow the examples in prompts/ to create a .txt file for new prompt. Input field should be indicated with {input} in the text file.

Base Models

Current Proposed Model List

'Aligned' models

Todo

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.idea		.idea
inference		inference
metrics		metrics
notebooks		notebooks
preds		preds
process_dataset		process_dataset
prompts		prompts
.env		.env
.gitignore		.gitignore
README.md		README.md
config.py		config.py
create_base_eval_dataset.py		create_base_eval_dataset.py
example_logprob_evaluate.py		example_logprob_evaluate.py
generation_analysis.ipynb		generation_analysis.ipynb
generation_evaluate.py		generation_evaluate.py
huggingface_api.py		huggingface_api.py
logprob_difference.py		logprob_difference.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ShadesofBias

Data Processing

Evaluation

HF Endpoints

Example Script

Add more prompts

Base Models

'Aligned' models

About

Releases

Packages

Contributors 6

Languages

bigscience-workshop/ShadesofBias

Folders and files

Latest commit

History

Repository files navigation

ShadesofBias

Data Processing

Evaluation

HF Endpoints

Example Script

Add more prompts

Base Models

'Aligned' models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages