Dataset demographic bias metrics

This package implements several metrics for dataset demographic bias. The metrics are organized as follows:

Representational bias metrics (dataset_bias_metrics.representational)
Stereotypical bias metrics at the global level (dataset_bias_metrics.stereotypical)
Stereotypical bias metrics at the local level (dataset_bias_metrics.local_stereotypical)
Some visualization tools (dataset_bias_metrics.visualization)

For the details on the bias metrics included, please see our paper "Metrics for Dataset Demographic Bias: A Case Study on Facial Expression Recognition", available in the IEEE Transactions on Pattern Analysis and Machine Intelligence.

Installation

Binary installers for the latest released version are available at the Python Package Index (PyPI).

pip install dataset-bias-metrics

Usage

Including the libraries

import pandas as pd
import os

import dataset_bias_metrics as dbm

Loading example datasets

The .csv files available in example_data correspond to the datasets analyzed in the paper. Only the demographic data with no identifying information is provided, and the rows are scrambled to avoid the identification of specific samples.

datasets = {}
for filename in os.listdir('example_data'):
    if filename.endswith(".csv"):
        ds = pd.read_csv(os.path.join('example_data', filename))
        datasets[filename.split('.')[0]] = ds
    
display(datasets['raf-db2'])

	age	race	gender	label
0	20-29	Indian	Male	disgust
1	20-29	White	Male	disgust
2	30-39	White	Male	angry
3	60-69	Southeast Asian	Female	happy
4	40-49	East Asian	Male	neutral
...	...	...	...	...
15121	30-39	White	Male	sad
15122	30-39	East Asian	Male	happy
15123	20-29	Black	Male	fear
15124	30-39	White	Female	happy
15125	20-29	White	Female	neutral

15126 rows × 4 columns

Representational bias

The following is an example of the application of the representational bias metrics:

# Application of a single metric
dbm.representational.ens(datasets['adfes'], 'race')

2.478206948646503

# It also supports combined components
dbm.representational.ens(datasets['adfes'], ['age', 'race'])

3.0404964294246533

# Comparative analysis across datasets, with visualization
component = 'race'
repbias = pd.DataFrame(0, 
                       index=datasets.keys(), 
                       columns=dbm.representational.metrics.keys())

for dsname, ds in datasets.items():
    for metricname, m in dbm.representational.metrics.items():
        repbias.loc[dsname, metricname] = m(ds, component)

display(f'{component.capitalize()} component')
dbm.visualization.plotTable(repbias.T, normalizeAxis=1, sort=None)

'Race component'

Stereotypical bias (global)

The following is an example of the application of the global stereotypical bias metrics:

# Application of a single metric
dbm.stereotypical.cramersv(datasets['expw'], 'race', 'label')

0.04104288518527493

# Comparative analysis across datasets, with visualization

c1, c2 = ('race', 'label')
stereobias = pd.DataFrame(0, 
                          index=datasets.keys(), 
                          columns=dbm.stereotypical.metrics.keys())

for dsname, ds in datasets.items():
    for metricname, m in dbm.stereotypical.metrics.items():
        stereobias.loc[dsname, metricname] = m(ds, c1, c2)

display(f'{c1.capitalize()}-{c2.capitalize()} components')
dbm.visualization.plotTable(stereobias.T, normalizeAxis=1)

'Race-Label components'

Stereotypical bias (local)

The following is an example of the application of the local stereotypical bias metrics:

# Single metric application with visualization
ds = datasets['expw']
c1, c2 = ('race', 'label')

matrix = dbm.local_stereotypical.duchersz(ds, c1, c2)

display(f'{c1.capitalize()}-{c2.capitalize()} components, {metricname}')
dbm.visualization.plotMatrix(matrix)

'Race-Label components, NMI'

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dataset_bias_metrics		dataset_bias_metrics
docs		docs
example_data		example_data
images		images
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
more_examples.ipynb		more_examples.ipynb
readme.ipynb		readme.ipynb
readme.md		readme.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataset demographic bias metrics

Installation

Usage

Including the libraries

Loading example datasets

Representational bias

Stereotypical bias (global)

Stereotypical bias (local)

About

Releases 1

Packages

Languages

License

irisdominguez/Dataset_Bias_Metrics

Folders and files

Latest commit

History

Repository files navigation

Dataset demographic bias metrics

Installation

Usage

Including the libraries

Loading example datasets

Representational bias

Stereotypical bias (global)

Stereotypical bias (local)

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages