Skip to content

Latest commit

 

History

History
87 lines (72 loc) · 7.67 KB

README.md

File metadata and controls

87 lines (72 loc) · 7.67 KB

Sequence-to-Sequence (Seq2Seq)

Sequence-to-Sequence (Seq2Seq) is a general end-to-end framework which maps sequences in source domain to sequences in target domain. Seq2Seq model first reads the source sequence using an encoder to build vector-based 'understanding' representations, then passes them through a decoder to generate a target sequence, so it's also referred to as the encoder-decoder architecture. Many NLP tasks have benefited from Seq2Seq framework, including machine translation, text summarization and question answering. Seq2Seq models vary in term of their exact architecture, multi-layer bi-directional RNN (e.g. LSTM, GRU, etc.) for encoder and multi-layer uni-directional RNN with autoregressive decoding (e.g. greedy, beam search, etc.) for decoder are natural choices for vanilla Seq2Seq model. Attention mechanism is later introduced to allow decoder to pay 'attention' to relevant encoder outputs directly, which brings significant improvement on top of already successful vanilla Seq2Seq model. Furthermore, 'Transformer', a novel architecture based on self-attention mechanism is proposed and has outperformed both recurrent and convolutional models in various tasks, although out-of-scope for this repo, I'd like to refer interested readers to this post for more details

Figure 1: Encoder-Decoder architecture of Seq2Seq model

Setting

  • Python 3.6.6
  • Tensorflow 1.12
  • NumPy 1.15.4

DataSet

  • IWSLT'15 English-Vietnamese is a small dataset for English-Vietnamese translation task, it contains 133K training pairs and top 50K frequent words are used as vocabularies.
  • WMT'14 English-French is large dataset for English-French translation task. The goals of this WMT shared translation task are, (1) to investigate the applicability of various MT techniques; (2) to examine special challenges in translating between English and French; (3) to create publicly available corpora for training and evaluating; (4) to generate up-to-date performance numbers as a basis of comparison in future research.
  • fastText is open source library for efficient text classification and representation learning. Pre-trained word vectors for 157 languages are distributed by fastText. These models were trained on Common Crawl and Wikipedia dataset using CBOW with position-weights, in dimension 300, with character n-grams of length 5, a window of size 5 and 10 negatives.
  • GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Usage

  • Run experiment
# run experiment in train mode
python seq2seq_run.py --mode train --config config/config_seq2seq_template.xxx.json
# run experiment in eval mode
python seq2seq_run.py --mode eval --config config/config_seq2seq_template.xxx.json
  • Encode source
# encode source as CoVe vector
python seq2seq_run.py --mode encode --config config/config_seq2seq_template.xxx.json
  • Search hyper-parameter
# random search hyper-parameters
python hparam_search.py --base-config config/config_seq2seq_template.xxx.json --search-config config/config_search_template.xxx.json --num-group 10 --random-seed 100 --output-dir config/search
  • Visualize summary
# visualize summary via tensorboard
tensorboard --logdir=output

Experiment

Vanilla Seq2Seq

Figure 1: Vanilla Seq2Seq architecture

IWSLT'15 EN-VI Perplexity BLEU Score
Dev 25.09 9.47
Test 25.87 9.35

Table 1: The performance of vanilla Seq2Seq model on IWSLT'15 English - Vietnamese task with setting: (1) for encoder, model type = Bi-LSTM, num layers = 1, unit dim = 512; (2) for decoder, model type = LSTM, num layers = 2, unit dim = 512, beam size = 10; (3) pre-trained embedding = false, max len = 300

IWSLT'15 VI-EN Perplexity BLEU Score
Dev 29.52 8.49
Test 33.16 7.88

Table 2: The performance of vanilla Seq2Seq model on IWSLT'15 Vietnamese - English task with setting: (1) for encoder, model type = Bi-LSTM, num layers = 1, unit dim = 512; (2) for decoder, model type = LSTM, num layers = 2, unit dim = 512, beam size = 10; (3) pre-trained embedding = false, max len = 300

Attention-based Seq2Seq

Figure 2: Attention-based Seq2Seq architecture

IWSLT'15 EN-VI Perplexity BLEU Score
Dev 12.56 22.41
Test 10.79 25.23

Table 3: The performance of attention-based Seq2Seq model on IWSLT'15 English - Vietnamese task with setting: (1) for encoder, model type = Bi-LSTM, num layers = 1, unit dim = 512; (2) for decoder, model type = LSTM, num layers = 2, unit dim = 512, beam size = 10; (3) pre-trained embedding = false, max len = 300, att type = scaled multiplicative

IWSLT'15 VI-EN Perplexity BLEU Score
Dev 11.83 19.37
Test 10.42 21.40

Table 4: The performance of attention-based Seq2Seq model on IWSLT'15 Vietnamese - English task with setting: (1) for encoder, model type = Bi-LSTM, num layers = 1, unit dim = 512; (2) for decoder, model type = LSTM, num layers = 2, unit dim = 512, beam size = 10; (3) pre-trained embedding = false, max len = 300, att type = scaled multiplicative

Reference