Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ngram entropy pruning #4594

Merged
merged 5 commits into from
Jul 21, 2021
Merged

Conversation

huangruizhe
Copy link
Contributor

Same (99%) as SRILM.

The unpruned original model (egs/swbd/s5c/data/local/lm/sw1.o3g.kn.gz):

\data
ngram 1=30275
ngram 2=455846
ngram 3=272601
file heldout: 10000 sentences, 118254 words, 0 OOVs
0 zeroprobs, logprob= -250951.4 ppl= 90.50555 ppl1= 132.4765

threshold=4.7e-5:

SRILM:
\data
ngram 1=30275
ngram 2=4681
ngram 3=655
file heldout: 10000 sentences, 118254 words, 0 OOVs
0 zeroprobs, logprob= -290823.7 ppl= 185.1658 ppl1= 287.9481

Our version:
\data
ngram 1=30275
ngram 2=4626
ngram 3=655
file heldout: 10000 sentences, 118254 words, 0 OOVs
0 zeroprobs, logprob= -291397.2 ppl= 187.0819 ppl1= 291.1811

threshold=1e-6:

SRILM:
\data
ngram 1=30275
ngram 2=155789
ngram 3=55781
file heldout: 10000 sentences, 118254 words, 0 OOVs
0 zeroprobs, logprob= -256473.7 ppl= 99.9384 ppl1= 147.5154

Our version:
\data
ngram 1=30275
ngram 2=155465
ngram 3=55781
file heldout: 10000 sentences, 118254 words, 0 OOVs
0 zeroprobs, logprob= -256570.9 ppl= 100.113 ppl1= 147.7948

threshold=3e-8

SRILM:
\data
ngram 1=30275
ngram 2=442951
ngram 3=245963
file heldout: 10000 sentences, 118254 words, 0 OOVs
0 zeroprobs, logprob= -251054.1 ppl= 90.67255 ppl1= 132.7417

Our version:
\data
ngram 1=30275
ngram 2=440476
ngram 3=245963
file heldout: 10000 sentences, 118254 words, 0 OOVs
0 zeroprobs, logprob= -251049.7 ppl= 90.6654 ppl1= 132.7303

@danpovey
Copy link
Contributor

Thanks!! Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants