Skip to content

Final project for CS 517: Theory of Computation at Oregon State University

License

Notifications You must be signed in to change notification settings

hoangle96/CS517-Final-Project

Repository files navigation

K-anonymize database through solving set cover with ILP

This is the course project for the CS517 : Theory of Complexity course

Contributors : Hoang Le, Rajesh Mangannavar

Aim : To k-anonymize a given databse by reducing it to set cover and solving it using Integer Linear programming.

Idea : We convert our k-anonymization problem into a set cover problem. We then solve the set cover problem using integer linear programming methods with cvx. We convert back the solution of set cover to a k-anonymized database.

Package requirements :

1. cvxpy==1.0.25
2. pandas
3. numpy

Running the experiment :

python cvx_solver.py -filename <filename.csv> -M <m> -N <n> -K <k>

with filename being the path to the database in .csv format, -M being the number of rows, -N being the number of public attributes needed to be anonymized, and -K being the number of rows that are similar to each other. The result will be saved in "k_anonymized_df.csv" with k being replaced by the value of -K. We also include the Adult dataset from the UCI Machine Learning Repository.

Example Run :

python cvx_solver.py -filename adult.csv -M 10 -K 2 -N 5 to 2-anonymize the first 10 rows of the adult dataset with the first 5 attributes being public attributes.

Run python cvx_solver.py -- help for information while running.

About

Final project for CS 517: Theory of Computation at Oregon State University

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages