t-systems-on-site-services-gmbh / german-wikipedia-text-corpus

This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.
22Updated 2 years ago

Related projects

Alternatives and complementary repositories for german-wikipedia-text-corpus