Adversarial document model

Code needed to reproduce the results from Modeling documents with Generative Adversarial Networks, presented at the NIPS workshop on Adversarial Training, December 2016.

Requirements

Requires Python 3 (tested with 3.6.1). The remaining dependencies can then be installed via:

    $ pip install -r requirements.txt
    $ python -c "import nltk; nltk.download('punkt')"

Data format and preprocessing

You first need to preprocess any input data into the format expected by the model:

    $ python preprocess.py --input <path to input dataset> --output <path to output dataset> --vocab <path to vocab file>

where <path to input directory> points to a directory containing an input dataset (described below), <path to output directory> gives the path to a newly created output dataset directory (containing the preprocessed data), and <path to vocab file> gives the path to a vocabulary file (described below).

Datasets: A directory containing CSV files. There is expected to be 1 CSV file per set or collection, with separate sets for training, validation and test. The CSV files in the directory must be named accordingly: training.csv, validation.csv, test.csv. For this task, each CSV file (prior to preprocessing) consists of 2 string fields with a comma delimiter - the first is the label and the second is the document body.

Vocabulary files: A plain text file, with 1 vocabulary token per line (note that this must be created in advance, we do not provide a script for creating vocabularies). We do provide the vocabulary file used in our 20 Newsgroups experiment in data/20newsgroups.vocab.

Training

The published results used the default parameters, so you just need to pass the input dataset and model output directories:

    $ python train.py --dataset <path to preprocessed dataset> --model <path to model output directory>

To view additional parameters:

    $ python train.py --help

Extracting document vectors and evaluating results

To evaluate the retrieval results:

    $ python evaluate.py --dataset <path to preprocessed dataset> --model <path to trained model directory>

To extract document vectors (will be saved in NumPy text format to the model directory):

    $ python vectors.py --dataset <path to preprocessed dataset> --model <path to trained model directory>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adversarial document model

Requirements

Data format and preprocessing

Training

Extracting document vectors and evaluating results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
model		model
.gitignore		.gitignore
Readme.md		Readme.md
evaluate.py		evaluate.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py
vectors.py		vectors.py

AYLIEN/adversarial-document-model

Folders and files

Latest commit

History

Repository files navigation

Adversarial document model

Requirements

Data format and preprocessing

Training

Extracting document vectors and evaluating results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages