Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Language model training data #329

Open
sbmaruf opened this issue Mar 9, 2021 · 0 comments
Open

Language model training data #329

sbmaruf opened this issue Mar 9, 2021 · 0 comments

Comments

@sbmaruf
Copy link

sbmaruf commented Mar 9, 2021

So far I understand that "language model is trained with the stream of text". That means there is no grammatical boundary of sentence start and end (i.e., full stop (.), exclam mark (!)). I was wondering if there are any noise-induced by this.

So my question is if I train a language model with/without sentence boundary may I expect to see any difference

  1. In downstream task adaptation
  2. In text generation

@glample @aconneau

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant