Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New force-alignment API and two-pass alignment to get phone/state durations #300

Merged
merged 55 commits into from
Sep 27, 2022

Conversation

dhdaines
Copy link
Contributor

@dhdaines dhdaines commented Sep 21, 2022

Now you can (relatively) easily do a second pass of alignment to get phone durations after decoding or word alignment.

Also, word alignment now uses FSG search, like SoundSwallower, so it's really fast and also handles silence and alternate pronunciations for you.

@dhdaines dhdaines assigned lenzo-ka and unassigned lenzo-ka Sep 21, 2022
@lenzo-ka
Copy link
Contributor

Excited to check this out! I'm at Interspeech and out of phase by half day and all, but I'll get a look shortly

@dhdaines
Copy link
Contributor Author

No problem! The CLI for state alignment isn't quite there yet, but coming soon (tonight, I hope).

@jsalsman
Copy link
Contributor

jsalsman commented Sep 21, 2022 via email

@dhdaines
Copy link
Contributor Author

dhdaines commented Sep 21, 2022

Fantastic! I also hope to try this out ASAP. I wonder whether constraining to the first pass's word boundaries will help. It seems like it can't hurt, but it would be interesting to measure how much.

It will definitely make the alignment faster. It may make it more accurate though I am not certain of this - I have to look at how I implemented this back in 2006: https://www.cs.cmu.edu/~dhuggins/Publications/phlab.pdf

EDIT: that paper was about forward-backward and not alignment, so not the same thing at all - in that case I implemented something like semi-Viterbi training, setting "impossible" phone sequences to zero probability, which resulted in models that were better for alignment (but somewhat worse for recognition)

Note that we *wont* do state alignment here for the moment
as it is dubiously useful unless you are doing unsupervised
MLLR, which should get a specific implementation
@dhdaines dhdaines marked this pull request as ready for review September 22, 2022 01:30
Copy link
Contributor

@lenzo-ka lenzo-ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hoping for state level alignments, and frame level scores also, but LGTM and WFM

@dhdaines
Copy link
Contributor Author

Hoping for state level alignments, and frame level scores also, but LGTM and WFM

State level alignments are already there in the Python API, look at cython/test/alignment_test.py for an example, but it is now easy to add them to the command-line front-end as well, so I'll do that (not on by default though)

@dhdaines dhdaines merged commit 68c5db8 into master Sep 27, 2022
@dhdaines dhdaines deleted the enhanced_alignment branch September 28, 2022 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants