datagger

A Simple Dialog Act Tagger

This project intends to tag the dialog act given a discourse.

Introduction

The corpus we used is from loria.fr. The corpus is used for a French Learnig dialog system. You can check corpus/sample to have a feeling about how the data looks like.

Software Depends

To use datagger, you need to install:

NLTK: NLTK is a leading platform for building Python programs to work with human language data. http://nltk.org/
CRF++: CRF++ is a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data. http://crfpp.googlecode.com/svn/trunk/doc/index.html
scikit-learn: sickit-learn is python package for machine learning. http://scikit-learn.org/stable/
KEA: KEA is a French tokenzier. https://github.com/boudinfl/kea

Howto

You can simply run

python2 Corpusbuilder --help
usage: CorpusBuilder.py [-h] [--cv] [--path PATH] corpuspath {crf,sample}

positional arguments:
  corpuspath    The path to the corpus
  {crf,sample}  select what kind of corpus to generate. sample: to generate
                unlabled data. crf: to build training and test data for crf

optional arguments:
  -h, --help    show this help message and exit
  --cv          build cross_validation corpus
  --path PATH   place to put the generated corpus data

to check the usage of the script.

After you genrate the CRF++ data format. Simply use

crf_learn template train_data model`

you will get a model file. Use this model to tag test data, run

crf_test model test_data`

Evaluation`

To evaluate our result, you can use the scipt from conll phrase recognition task. http://www.cnts.ua.ac.be/conll2000/chunking/output.html To evaluate our result, run:

perl conlleval.pl -r -d '\t'` < result_you_got

` To check more information of this project, read our report.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
presentation		presentation
report		report
result		result
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datagger

A Simple Dialog Act Tagger

Introduction

Software Depends

Howto

Evaluation`

About

Releases

Packages

Languages

qiuwei/datagger

Folders and files

Latest commit

History

Repository files navigation

datagger

A Simple Dialog Act Tagger

Introduction

Software Depends

Howto

Evaluation`

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages