authorship-verification

Authorship Verification 2020

University of Copenhagen & IT and Cognition & Language Processing 2

This repository contains the scripts used to generate the models for the PAN 2020 authorship verification shared task: https://pan.webis.de/clef20/pan20-web/author-identification.html . This project is intented to decide whether two texts have been written by the same author based on comparing the texts' writing styles. I developed all scripts in Jupyter Notebooks.

Running this project:

put the 'pan20-authorship-verification-training-small-truth.jsonl' and 'pan20-authorship-verification-training-small.jsonl' two files into data folder. Dataset can be dowloaded from https://pan.webis.de/data.html#pan20-authorship-verification
run the dataread.ipynb first to read dataset.
run feature_extract.ipynb. This script extracts all the features and saves them in the 'feature' folder.
run model.ipynb. This script read the features in feature folder, create feature matrix as input, train the models, make prediction using the trained models, and save the trained models in the 'model' folder.
run new_text_predict_test.ipynb. This script loads the trained model in the 'model' folder and predicts two texts written by an author that does not belong to the training set.

Features

Average Sentence Length
Average Word Length
Noun
Pronoun
Adjective
Numeral
Verb
Adverb
Period
Comma
Mispelling
Bi-grams
Tri-grams
Four-grams
TF-IDF and cosine similarity
Yule's K value
Lexical Richness
Tri-punctuations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

authorship-verification

Running this project:

Features

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
feature		feature
LICENSE		LICENSE
README.md		README.md
dataread.ipynb		dataread.ipynb
feature_extract.ipynb		feature_extract.ipynb
model.ipynb		model.ipynb
new_text_predict_test.ipynb		new_text_predict_test.ipynb

License

xuemduan/authorship-verification

Folders and files

Latest commit

History

Repository files navigation

authorship-verification

Running this project:

Features

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages