Skip to content

Latest commit

 

History

History
10 lines (7 loc) · 642 Bytes

README.md

File metadata and controls

10 lines (7 loc) · 642 Bytes

Document Similarity index calculator

Checks similarity index between 2 documents for the purpose of plagiarism detection. Algorithm based on graph theory.
The two input documents are named : doc1.txt and doc2.txt
Algorithm

Similarity score between 2 documents, given the optimal matching:

𝒙=(Σ𝐦𝐚𝐭𝐜𝐡𝐢𝐧𝐠[𝐢][𝐣])/𝟎.𝟓(𝐥𝟏+𝐥𝟐)

Where l1 and l2 are the number of sentences in document 1 and document 2 respectively. Matching is the 2-d matrix showing the optimal matching and the edge weights.