t-systems-on-site-services-gmbh / german-wikipedia-text-corpus

This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.
23Updated 2 years ago

Alternatives and similar repositories for german-wikipedia-text-corpus:

Users that are interested in german-wikipedia-text-corpus are comparing it to the libraries listed below