Problem during preparing G #23

EmreOzkose · 2021-08-23T14:33:58Z

Hi, I am trying to train a tdnn-lstm model on a Turkish data. I run an experiment successfully once (including decoding part) with smaller langauge model. Then I tried a new corpus for language modeling. During preparing G step, I got this error:

2021-08-23 11:35:45 (prepare.sh:118:main) Stage 6: Compile HLG
2021-08-23 11:35:46,389 INFO [compile_hlg.py:126] Processing data/lang_phone
2021-08-23 11:35:46,859 INFO [lexicon.py:99] Converting L.pt to Linv.pt
2021-08-23 11:35:47,640 INFO [compile_hlg.py:48] Building ctc_topo. max_token_id: 52
2021-08-23 11:35:47,826 INFO [compile_hlg.py:57] Loading G_3_gram.fst.txt
2021-08-23 11:38:15,955 INFO [compile_hlg.py:68] Intersecting L and G
2021-08-23 11:51:37,889 INFO [compile_hlg.py:70] LG shape: (301909252, None)
2021-08-23 11:51:37,889 INFO [compile_hlg.py:72] Connecting LG
2021-08-23 11:51:37,889 INFO [compile_hlg.py:74] LG shape after k2.connect: (301909252, None)
2021-08-23 11:51:37,889 INFO [compile_hlg.py:76] <class 'torch.Tensor'>
2021-08-23 11:51:37,889 INFO [compile_hlg.py:77] Determinizing LG
2021-08-23 12:09:11,585 INFO [compile_hlg.py:80] <class '_k2.RaggedInt'>
2021-08-23 12:09:11,585 INFO [compile_hlg.py:82] Connecting LG after k2.determinize
2021-08-23 12:09:11,585 INFO [compile_hlg.py:85] Removing disambiguation symbols on LG
[F] /usr/share/miniconda/envs/k2/conda-bld/k2_1628135473078/work/k2/csrc/tensor.cu:159:k2::Tensor::Tensor(k2::Dtype, const k2::Shape&, k2::RegionPtr, int32_t) Check failed: int64_t(impl_->byte_offset) + begin_elem * element_size >= 0 (-1246502780 vs. 0) 


[ Stack-Trace: ]
/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/libk2_log.so(k2::internal::GetStackTrace()+0x4c) [0x7fb12e7c76bc]
/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/libk2context.so(k2::Tensor::Tensor(k2::Dtype, k2::Shape const&, std::shared_ptr<k2::Region>, int)+0x6da) [0x7fb12ed24aca]
/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/libk2context.so(k2::Array2<int>::Col(int)+0x13a) [0x7fb12ecd03ba]
/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/libk2context.so(+0x27e0d9) [0x7fb12ecc40d9]
/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/libk2context.so(k2::Index(k2::RaggedShape&, int, k2::Array1<int> const&, k2::Array1<int>*)+0x1da) [0x7fb12ecc649a]
/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xc5395) [0x7fb134ced395]
/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0xabc90) [0x7fb134cd3c90]
/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so(+0x1dfcf) [0x7fb134c45fcf]
python3(PyCFunction_Call+0x54) [0x555e1b3b3914]
python3(_PyObject_MakeTpCall+0x31e) [0x555e1b3b6ebe]
python3(_PyEval_EvalFrameDefault+0x52f6) [0x555e1b458986]
python3(_PyFunction_Vectorcall+0x1a6) [0x555e1b43b646]
python3(_PyEval_EvalFrameDefault+0x947) [0x555e1b453fd7]
python3(_PyFunction_Vectorcall+0x1a6) [0x555e1b43b646]
python3(_PyEval_EvalFrameDefault+0x947) [0x555e1b453fd7]
python3(_PyEval_EvalCodeWithName+0x2c3) [0x555e1b43a433]
python3(_PyFunction_Vectorcall+0x378) [0x555e1b43b818]
python3(_PyEval_EvalFrameDefault+0x1822) [0x555e1b454eb2]
python3(_PyFunction_Vectorcall+0x1a6) [0x555e1b43b646]
python3(_PyEval_EvalFrameDefault+0x4d33) [0x555e1b4583c3]
python3(_PyFunction_Vectorcall+0x1a6) [0x555e1b43b646]
python3(_PyEval_EvalFrameDefault+0x947) [0x555e1b453fd7]
python3(_PyFunction_Vectorcall+0x1a6) [0x555e1b43b646]
python3(_PyEval_EvalFrameDefault+0x947) [0x555e1b453fd7]
python3(_PyEval_EvalCodeWithName+0x2c3) [0x555e1b43a433]
python3(PyEval_EvalCodeEx+0x39) [0x555e1b43b499]
python3(PyEval_EvalCode+0x1b) [0x555e1b4d6ecb]
python3(+0x252f63) [0x555e1b4d6f63]
python3(+0x26f033) [0x555e1b4f3033]
python3(+0x274022) [0x555e1b4f8022]
python3(PyRun_SimpleFileExFlags+0x1b2) [0x555e1b4f8202]
python3(Py_RunMain+0x36d) [0x555e1b4f877d]
python3(Py_BytesMain+0x39) [0x555e1b4f8939]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7fb25ee04b97]
python3(+0x1e8f39) [0x555e1b46cf39]

Traceback (most recent call last):
  File "./local/compile_hlg.py", line 140, in <module>
    main()
  File "./local/compile_hlg.py", line 128, in main
    HLG = compile_HLG(lang_dir)
  File "./local/compile_hlg.py", line 92, in compile_HLG
    LG = k2.remove_epsilon(LG)
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/fsa_algo.py", line 562, in remove_epsilon
    out_fsa = k2.utils.fsa_from_unary_function_ragged(fsa, ragged_arc, arc_map,
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/utils.py", line 515, in fsa_from_unary_function_ragged
    new_value = index(value, arc_map)
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py", line 335, in index
    return index_ragged(src, indexes)
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py", line 283, in index_ragged
    return _k2.index(src, indexes)
RuntimeError: Some bad things happed.

How can I solve that? My arpa files are 2.1gb and 6.2gb for 3 gram and 4 gram respectively. Could it be related to size issue? My language models are prepared via Kenlm.

Is it relevant to icefall, or should I ask on K2 repository?

The text was updated successfully, but these errors were encountered:

csukuangfj · 2021-08-23T14:45:41Z

Seems that the G is too large and there is an overflow in k2.

danpovey · 2021-08-23T15:00:37Z

Hm. I'd like to see some debug information about what's in shape that is causing this; I suspect error may have been earlier, and may have been fixable.
We should actually support loading Kenlm LM's onto GPU and using them via a deterministic-FST interaface. But that will take some time.

EmreOzkose · 2021-08-24T12:05:09Z

I tried to reproduce this case for getting shapes, but it takes too long (2-3 hours) and the script is just killled:

2021-08-24 09:29:22 (prepare.sh:44:main) pre_file_dir: /path/to/icefall/egs/sestek/ASR/pre_files
2021-08-24 09:29:22 (prepare.sh:119:main) Stage 6: Compile HLG
2021-08-24 09:29:23,508 INFO [compile_hlg.py:128] Processing data/lang_phone
2021-08-24 09:29:24,062 INFO [lexicon.py:96] Loading pre-compiled data/lang_phone/Linv.pt
2021-08-24 09:29:24,289 INFO [compile_hlg.py:48] Building ctc_topo. max_token_id: 52
2021-08-24 09:29:24,489 INFO [compile_hlg.py:53] Loading pre-compiled G_3_gram
2021-08-24 09:29:30,649 INFO [compile_hlg.py:68] Intersecting L and G
2021-08-24 09:42:25,292 INFO [compile_hlg.py:70] LG shape: (301909252, None)
2021-08-24 09:42:25,292 INFO [compile_hlg.py:72] Connecting LG
(301909252, None)
2021-08-24 09:42:25,292 INFO [compile_hlg.py:75] LG shape after k2.connect: (301909252, None)
(301909252, None)
2021-08-24 09:42:25,292 INFO [compile_hlg.py:77] <class 'torch.Tensor'>
2021-08-24 09:42:25,292 INFO [compile_hlg.py:78] Determinizing LG
2021-08-24 09:58:24,804 INFO [compile_hlg.py:81] <class '_k2.RaggedInt'>
(214860596, None)
2021-08-24 09:58:24,804 INFO [compile_hlg.py:83] Connecting LG after k2.determinize
2021-08-24 09:58:24,804 INFO [compile_hlg.py:86] Removing disambiguation symbols on LG
(214860596, None)
(214860596, None)
(214860596, None)
./prepare.sh: line 126: 23181 Killed                  ./local/compile_hlg.py --lang-dir data/lang_phone
2021-08-24 10:04:32,846 INFO [compile_hlg.py:128] Processing data/lang_bpe_5000
2021-08-24 10:04:33,354 INFO [lexicon.py:96] Loading pre-compiled data/lang_bpe_5000/Linv.pt
2021-08-24 10:04:33,526 INFO [compile_hlg.py:48] Building ctc_topo. max_token_id: 4999
2021-08-24 10:04:34,405 INFO [compile_hlg.py:53] Loading pre-compiled G_3_gram
2021-08-24 10:04:41,941 INFO [compile_hlg.py:68] Intersecting L and G
2021-08-24 10:07:20,234 INFO [compile_hlg.py:70] LG shape: (80262949, None)
2021-08-24 10:07:20,234 INFO [compile_hlg.py:72] Connecting LG
(80262949, None)
2021-08-24 10:07:20,234 INFO [compile_hlg.py:75] LG shape after k2.connect: (80262949, None)
(80262949, None)
2021-08-24 10:07:20,234 INFO [compile_hlg.py:77] <class 'torch.Tensor'>
2021-08-24 10:07:20,234 INFO [compile_hlg.py:78] Determinizing LG
./prepare.sh: line 122: 27422 Killed                  ./local/compile_hlg.py --lang-dir $lang_dir
'lang_bpe/lang_bpe_5000' -> 'lang_bpe_5000'

I tried to count via irstlm instead of kenlm. When I pruned kenlm model, it was reduced to 2.1 gb from 5 gb for 3-gram. However I pruned irstlm model (with threshold 3e-7) and now sizes are 68M and 85M for 3-gram and 4-gram respectively.

In the end, preparing G is done successfully.

EmreOzkose closed this as completed Aug 24, 2021

Lzhang-hub mentioned this issue Oct 20, 2021

CUDA out of memory in decoding #70

Open

danpovey mentioned this issue Nov 27, 2021

Decoding error 'Fsa' object doesn't support assignment. #133

Open

ahazned mentioned this issue Apr 13, 2022

Illegal memory error when training with multi-GPU #247

Open

csukuangfj mentioned this issue Oct 8, 2022

Support using OpenFst to compile HLG. #606

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem during preparing G #23

Problem during preparing G #23

EmreOzkose commented Aug 23, 2021

csukuangfj commented Aug 23, 2021

danpovey commented Aug 23, 2021

EmreOzkose commented Aug 24, 2021

Problem during preparing G #23

Problem during preparing G #23

Comments

EmreOzkose commented Aug 23, 2021

csukuangfj commented Aug 23, 2021

danpovey commented Aug 23, 2021

EmreOzkose commented Aug 24, 2021