Language model training data #329

sbmaruf · 2021-03-09T01:58:02Z

So far I understand that "language model is trained with the stream of text". That means there is no grammatical boundary of sentence start and end (i.e., full stop (.), exclam mark (!)). I was wondering if there are any noise-induced by this.

So my question is if I train a language model with/without sentence boundary may I expect to see any difference

In downstream task adaptation
In text generation

@glample @aconneau

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language model training data #329

Language model training data #329

sbmaruf commented Mar 9, 2021

Language model training data #329

Language model training data #329

Comments

sbmaruf commented Mar 9, 2021