Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not training with another language #86

Open
phamkhactu opened this issue Sep 6, 2023 · 0 comments
Open

Can not training with another language #86

phamkhactu opened this issue Sep 6, 2023 · 0 comments

Comments

@phamkhactu
Copy link

Thanks for your excellent working!

I want to training my g2p with other language, in my case is vietnamese

phạc ph a_T5 c2
num n u_T0 m2
rim r i_T0 m2
giẫn gi a3_T4 n2
toăm t oa2_T0 m2
lịu l iu_T5
cựi c u2i_T5
õng o_T4 ng2

I get error:

INFO:phonetisaurus-train:2023-09-06 17:51:21:  Checking command configuration...
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  Directory does not exist.  Trying to create.
INFO:phonetisaurus-train:2023-09-06 17:51:21:  Checking lexicon for reserved characters: '}', '|', '_'...
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  arpa_path:  train/model.o8.arpa
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  corpus_path:  train/model.corpus
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  dir_prefix:  train
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  grow:  False
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  lexicon_file:  /tmp/tmp53qaxdn7.txt
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  logger:  <Logger phonetisaurus-train (DEBUG)>
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  makeJointNgramCommand:  <bound method G2PModelTrainer._mitlm of <__main__.G2PModelTrainer object at 0x7fd3aae0ec40>>
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  model_path:  train/model.fst
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  model_prefix:  model
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  ngram_order:  8
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  seq1_del:  False
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  seq1_max:  2
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  seq2_del:  True
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  seq2_max:  2
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  verbose:  True
DEBUG:phonetisaurus-train:2023-09-06 17:51:21:  phonetisaurus-align --input=/tmp/tmp53qaxdn7.txt --ofile=train/model.corpus --seq1_del=false --seq2_del=true --seq1_max=2 --seq2_max=2 --grow=false
INFO:phonetisaurus-train:2023-09-06 17:51:21:  Aligning lexicon...
GitRevision: package
Loading input file: /tmp/tmp53qaxdn7.txt
Please provide a valid input file.
ERROR:phonetisaurus-train:2023-09-06 17:51:21:  Alignment failed.  Exiting.
Traceback (most recent call last):
  File "/home/tupk/anaconda3/envs/nlp/bin/phonetisaurus", line 8, in <module>
    sys.exit(main())
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/phonetisaurus/__main__.py", line 74, in main
    do_train(args, casing, env)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/phonetisaurus/__main__.py", line 209, in do_train
    train(lexicon=lexicon, model_path=args.model, corpus_path=args.corpus, env=env)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/phonetisaurus/__init__.py", line 121, in train
    subprocess.check_call(train_cmd, cwd=temp_dir_str, env=env)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['phonetisaurus-train', '--lexicon', '/tmp/tmp53qaxdn7.txt', '--seq2_del', '--verbose']' returned non-zero exit status 1.

But if I train with English lexicon, no problem

How can I fix it?
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant