Skip to content

CornellDataScience/DuQI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DuQI: Duplicate Question Identification

Members: Brandon Kates, Zhao Shen, Arnav Ghosh

Objective: To create a system capable of detecting duplicate questions on Q&A platforms.

We expect our approach to help centralize the available knowledge on a single question/issue and direct users with questions that have already been answered to the appropriate resource.

We will test a variety of duplicate question identification methods on the Quora question pairs dataset, and hope to eventually apply our findings to the classroom Q&A platform Piazza to improve the Cornell student experience.

Data Requirements

Below is the data required to successfully train/run all of the models.

In the current directory ("DuQI"), create a folder named "data" and populate it with:

Final directory should look like:

  • data
    • Quora training/test CSV files

About

🦆-ee: Duplicate Question Identification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published