Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add tokenize-hotwords option #1039

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

pkufool
Copy link
Contributor

@pkufool pkufool commented Jun 21, 2024

This PR add tokenize-hotwords option to hotwods. Now we only support tokenizing hotwords for models trained on cjkchar, bpe and cjkchar+bpe. For those who want to use hotwords for other modeling units, they could set --tokenize-hotwords to false and pre-tokenize the hotwords before putting into the decoder.

@w11wo
Copy link
Contributor

w11wo commented Jul 26, 2024

Hi @pkufool, is there any update on this PR? Thanks beforehand!

@pkufool
Copy link
Contributor Author

pkufool commented Jul 29, 2024

Hi @pkufool, is there any update on this PR? Thanks beforehand!

Oh, I thought this was merged, will have a look.

@w11wo
Copy link
Contributor

w11wo commented Aug 26, 2024

Hi @pkufool, sorry to keep tagging you. Are there any updates? 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants