Add jupyter notebook with common neighbours metric on football dataset #342

iandioch · 2019-02-12T01:31:28Z

Connects to #304.

This PR implements the common neighbours metric, and runs it on one of the Pajek datasets used in the paper in #313. Included is loading the dataset, computing the similarity matrix, and computing the AUC (area under receiver operating characteristic curve), which is a measure of true positives vs. false positives. The formula to calculate the AUC (based on n1, n2, n3) comes from the same paper mentioned above.

Added in a new research/ directory, which we can use for a lot of non-prod code in future I guess - experiments, explorations, etc.

What will be required in the actual implementation of this metric in Rabble is a microservice including the computation of a similarity matrix, with some extra bits for eg. updating the matrix regularly (as our follow graph changes), an API for actually getting recommendations from the similarity matrix (ie. return some number of js where S[i][j] is maximal for some given i), and maybe a slightly different accuracy measurement, depending on how easy it is to apply AUC to our own database. Some of the code here might be replicated there later, IDK for sure.

Our results in this notebook:

vs. the paper's results with this metric on the same dataset:

Compare the Average row of our output with the [n1 n2 n3 AUC] in the Average column from the paper.

iandioch · 2019-02-12T01:37:39Z

FYI, Github will actually render a ipynb file if you click the View file button in the changeset, so you don't necessarily need to run jupyter yourself to review.

devoxel

It's pretty hard to review in it's current form, it isn't production code obviously but it'd be nice if it was easier to read

EDIT: Missed how to view the file properly

research/common_neighbours_football.ipynb

iandioch added 3 commits February 12, 2019 02:10

Add CN ipynb file

4b59d71

Add football dataset

985914b

Add datasets readme

d9b764e

iandioch requested a review from SailSlick February 12, 2019 01:31

devoxel reviewed Feb 12, 2019

View reviewed changes

research/common_neighbours_football.ipynb Show resolved Hide resolved

devoxel approved these changes Feb 13, 2019

View reviewed changes

iandioch merged commit 69f6c88 into master Feb 13, 2019

iandioch deleted the n/cn_metric branch February 13, 2019 19:30

devoxel mentioned this pull request Feb 14, 2019

Sprint 4 Sprint Log #299

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add jupyter notebook with common neighbours metric on football dataset #342

Add jupyter notebook with common neighbours metric on football dataset #342

iandioch commented Feb 12, 2019 •

edited

Loading

iandioch commented Feb 12, 2019

devoxel left a comment •

edited

Loading

Add jupyter notebook with common neighbours metric on football dataset #342

Add jupyter notebook with common neighbours metric on football dataset #342

Conversation

iandioch commented Feb 12, 2019 • edited Loading

iandioch commented Feb 12, 2019

devoxel left a comment • edited Loading

Choose a reason for hiding this comment

iandioch commented Feb 12, 2019 •

edited

Loading

devoxel left a comment •

edited

Loading