t-systems-on-site-services-gmbh / german-wikipedia-text-corpusView external linksLinks
This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.
☆23Feb 22, 2022Updated 3 years ago
Alternatives and similar repositories for german-wikipedia-text-corpus
Users that are interested in german-wikipedia-text-corpus are comparing it to the libraries listed below
Sorting:
- This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.☆28Dec 15, 2019Updated 6 years ago
- Wikipedia text corpus for self-supervised NLP model training☆46Jul 17, 2022Updated 3 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆38Dec 14, 2021Updated 4 years ago
- ☆16Apr 2, 2021Updated 4 years ago
- Deutsches Lyrik Korpus (DLK) / German Poetry Corpus☆19May 21, 2024Updated last year
- Tools for Optuna, MLflow and the integration of both.☆17May 28, 2023Updated 2 years ago
- German Dataset for Legal Information Retrieval☆24Feb 26, 2024Updated last year
- Plan and train German transformer models.☆23Feb 22, 2021Updated 4 years ago
- A simple BNF parser generator for Python. Note: in development!☆27Mar 6, 2018Updated 7 years ago
- dnsmasq docker image, fully configurable through ENV☆32Feb 1, 2026Updated 2 weeks ago
- Extract data from German Wiktionary XML files.☆26Jan 8, 2026Updated last month
- A merged version of multiple open-source German speech datasets.☆34May 3, 2024Updated last year
- A dead simple, insecure git-over-http server using nginx☆32Dec 16, 2021Updated 4 years ago
- An SDK and Library that is used in several Deutsche Telekom mobile apps☆12Sep 23, 2024Updated last year
- A function invocation framework for Python☆11Feb 21, 2024Updated last year
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- Replication package for "Fine-grained prediction of food crises from news streams"☆10Jun 27, 2023Updated 2 years ago
- An opinionated NLP research template☆10Aug 29, 2024Updated last year
- DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models☆157Dec 6, 2022Updated 3 years ago
- ☆11Mar 15, 2024Updated last year
- Winter Break Collaboratory DS Boot Camp during the academic year of 2017-2018☆14Feb 12, 2018Updated 8 years ago
- A database of climate change newspaper articles☆16Jan 31, 2026Updated 2 weeks ago
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 2 years ago
- ☆14Jan 6, 2025Updated last year
- General information about DEEP BERLIN's AI for Good Hackathon 2020☆11Apr 14, 2020Updated 5 years ago
- A rolling version of the Latent Dirichlet Allocation.☆13Nov 27, 2023Updated 2 years ago
- Poetry Corpora Annotated on Aesthetic Emotions☆12Aug 2, 2022Updated 3 years ago
- An example repo that demonstrates how to properly test Python code that interface with Elasticsearch.☆12Aug 26, 2020Updated 5 years ago
- Transformer language model (GPT-2) with sentencepiece tokenizer☆10Oct 15, 2019Updated 6 years ago
- Specialization of BERT architecture both for the Spanish language and the Twitter domain☆13Nov 6, 2020Updated 5 years ago
- A bridge between Pydantic V2 models and RDF graphs☆18May 3, 2025Updated 9 months ago
- Lowering PyTorch's Memory Consumption for Selective Differentiation☆12Aug 29, 2024Updated last year
- Code to create the dataset from "A New Aligned Simple German Corpus☆11Jan 8, 2024Updated 2 years ago
- A ready to use CMake + ANTLR simple starter with not dependencies. :+1:☆10Mar 28, 2025Updated 10 months ago
- TTS前,文本标准化,将数字字母处理转化为汉字☆12Apr 27, 2024Updated last year
- ☆13Jun 2, 2021Updated 4 years ago
- Runnerty quick start example project☆13Sep 6, 2024Updated last year
- Testing theories of sentence vectors on real world data☆11Jun 21, 2017Updated 8 years ago
- Machine Learning Toolbox 2☆13Nov 22, 2025Updated 2 months ago