Training language models using large text corpora