Fair-Link-Prediction

Project for Data Mining, Basic Course (ID2211) at KTH.

In the project we explore if MovieLens-100K data has biases wrt to gender, by trainining and evaluating different Classical and Neural Link Prediction methods.

Methods used for Link-Prediction:

GNNs
Network based methods.

Performance metrics for different feature ablations

SAGEConv and other models were tested with features which would balance accuracy and fairness metric performance. Mean aggreations are used wherever necessary as they would lead to more generalized features.

Feature Ablation	AUC	Accuracy	F1	SP	EO
Age + Occ + Gender, Movie	0.8643	0.7998	0.6809	0.036829	0.047949
Age + Occ, Movie	0.8635	0.7971	0.6837	0.026296	0.030179
Age, Movie	0.8061	0.7310	0.4290	0.020913	0.021649

Performance measures for the different models tested

Model		AUC	Accuracy	F1-score	Precision	Recall	SP	EO
Bi-common-neighbors		0.8584	0.7424	0.7001	0.5888	0.8632
Bi-Adamic-Adar		0.8599	0.7436	0.7017	0.5907	0.8642
Bi-Jaccard		0.8695	0.7448	0.7027	0.5905	0.8674
Preferential Attachment		0.8450	0.7273	0.6781	0.5653	0.8472
Architecture	Layers	AUC	Accuracy	F1-score	Precision	Recall	SP	EO
GATConv	3	0.8013	0.7439	0.5956	0.5656	0.6288	0.007911	0.007694
	4	0.8306	0.7609	0.6587	0.6923	0.6283	0.009295	0.008364
	5	0.9215	0.8504	0.7801	0.7963	0.7647	0.024848	0.018533
	6	0.7706	0.6751	0.0635	0.0331	0.8097	0.004536	0.013637
GINConv	3	0.7118	0.7025	0.5692	0.5895	0.5502	0.026767	0.001744
	4	0.8606	0.8041	0.7365	0.8214	0.6675	0.016521	0.006719
	5	0.7960	0.5924	0.5766	0.8324	0.4410	0.004849	0.000996
	6	0.5927	0.6921	0.5363	0.5342	0.5384	0.012549	0.004277
GraphConv	3	0.8822	0.8142	0.7047	0.6650	0.7494	0.018200	0.006651
	4	0.8853	0.8144	0.6977	0.6426	0.7632	0.021984	0.023273
	5	0.8822	0.8130	0.7288	0.7538	0.7053	0.011697	0.005238
	6	0.8914	0.8187	0.7355	0.7564	0.7158	0.030041	0.000958
SAGEConv	3	0.8767	0.8100	0.6994	0.6632	0.7399	0.009929	0.016642
	4	0.8782	0.8116	0.7193	0.7244	0.7143	0.028708	0.008952
	5	0.8804	0.8101	0.6992	0.6621	0.7406	0.016706	0.003372
	6	0.8747	0.8009	0.7135	0.7439	0.6855	0.017205	0.001091
TransformerConv	3	0.8843	0.8130	0.7243	0.7372	0.7119	0.010989	0.012458
	4	0.8855	0.8149	0.7249	0.7316	0.7183	0.015846	0.011875
	5	0.8780	0.8056	0.7044	0.6948	0.7143	0.023165	0.010082
	6	0.8901	0.8193	0.7279	0.7249	0.7309	0.026621	0.012776

Fairness De-biasing Methods

Three post-processing methods were used for de-biasing the rating differences between male and female movie watchers. These methods were only applied to the chosen models as they showcased a good balance over both performance and fairness metric before de-biasing.

Distribution Weights: Weights of rate of movie-ratings (edge-rate between male & female users) for re-weighing the pdf of softmax predictions.
Naive Optimized Weights: Similar but weights are found by reducing the difference between SP and EO in a naive grid-search like fashion.
Linear Optimized Weights(from scipy): A linear cost-fn that finds weights by balancing SP,EO and F1.

GNN	MAP	F1	AUC	Pr	Rec	SP(Overall)	EO(Overall)
GCN-3L + Dist. Weights	0.2849	0.4593	0.8590	0.31	0.8181	0.00945	0.001238
GCN-3L + Naive Optimized Weighting	0.5232	0.6769	0.8862	0.60	0.777	0.0174	0.0114
GCN-3L + Linear Opt. Weights	0.6736	0.74	0.8849	0.8033	0.6859	0.0270	0.001
GAT-5L +Dist. Weights	0.12	0.2151	0.7911	0.1206	0.9926	0.000626	0.033827
GAT-5L + Naive Optimized Weighting	0.5289	0.6873	0.8201	0.5933	0.8168	0.0126	0.0128
GAT-5L + Linear Opt. Weights	0.6349	0.7643	0.87921	0.7051	0.8334	0.0219	0.0034

Graph Showcases how probability distribution of male and female ratings are shifted in GATConv-5L. One can notice that re-weighting via linear optimization shifts the density of female probability ratings , almost like a lifter in Speech processing and reduces the width of probability distribution of male rating distributions. For GraphConv-3L we observe something similar with density but given better parity scores we notice that shifting the pdf of female group towards right and making the width of male pdf smaller results in somewhat better parity between the two groups.

The bi-modal nature of the predictions is due to the experimental choice to split ratings into 0/1 labels.

Conclusion

We notice that although there might be slight loss in performance metrics, there is quite an improvement over the Fairness metrics. Simple post-processing methods seem quite useful however, further fine-tuning, other methods such as changes to latent distribution might help. Model parameters,layers doesn't seem to show any correlation with the fairness metrics, however aggregation schemes of different GNNs could in-theory along with our network be a cause of influence.

Authors: refer report.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Data		Data
Notebooks		Notebooks
papers		papers
similarities-old		similarities-old
ID2211_Project_Report.pdf		ID2211_Project_Report.pdf
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fair-Link-Prediction

Performance metrics for different feature ablations

Performance measures for the different models tested

Fairness De-biasing Methods

Conclusion

About

Releases

Packages

Languages

License

Agrover112/Fair-Link-Prediction

Folders and files

Latest commit

History

Repository files navigation

Fair-Link-Prediction

Performance metrics for different feature ablations

Performance measures for the different models tested

Fairness De-biasing Methods

Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages