This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.
☆23Feb 22, 2022Updated 4 years ago
Alternatives and similar repositories for german-wikipedia-text-corpus
Users that are interested in german-wikipedia-text-corpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.☆28Dec 15, 2019Updated 6 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆39Dec 14, 2021Updated 4 years ago
- 🔁 Async JSON-RPC 2.0 protocol + server powered by asyncio & py35+. json-rpc successor.☆22Jul 21, 2023Updated 2 years ago
- Plan and train German transformer models.☆23Feb 22, 2021Updated 5 years ago
- German lemmatization with IWNLP as extension for spaCy☆27Apr 13, 2026Updated 2 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Deutsches Lyrik Korpus (DLK) / German Poetry Corpus☆20May 21, 2024Updated last year
- Python code to automatically produce a summary of a piece of text.☆12Sep 8, 2016Updated 9 years ago
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 3 years ago
- ☆11Mar 15, 2024Updated 2 years ago
- dnsmasq docker image, fully configurable through ENV☆32Apr 23, 2026Updated last week
- An opinionated NLP research template☆10Aug 29, 2024Updated last year
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the dat…☆168Dec 29, 2024Updated last year
- A merged version of multiple open-source German speech datasets.☆34May 3, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A simple BNF parser generator for Python. Note: in development!☆27Mar 6, 2018Updated 8 years ago
- Python source code for EMNLP 2021 Findings paper: "Subword Mapping and Anchoring Across Languages".☆13Sep 17, 2021Updated 4 years ago
- Collection of Twitter-related helper functions for python.☆14Feb 24, 2026Updated 2 months ago
- DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models☆158Dec 6, 2022Updated 3 years ago
- A semantic versioning library for Python☆46May 11, 2022Updated 3 years ago
- ☆16Jun 14, 2024Updated last year
- ☆14Jan 6, 2025Updated last year
- ☆10May 5, 2017Updated 8 years ago
- Transformer language model (GPT-2) with sentencepiece tokenizer☆10Oct 15, 2019Updated 6 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A database of climate change newspaper articles☆16Jan 31, 2026Updated 3 months ago
- Poems retrieval demo built with GNES framework☆14Oct 3, 2019Updated 6 years ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆29Apr 17, 2024Updated 2 years ago
- General information about DEEP BERLIN's AI for Good Hackathon 2020☆11Apr 14, 2020Updated 6 years ago
- TTS前,文本标准化,将数字字母处理转化为汉字☆12Apr 27, 2024Updated 2 years ago
- Replication package for "Fine-grained prediction of food crises from news streams"☆10Jun 27, 2023Updated 2 years ago
- ☆13Aug 13, 2020Updated 5 years ago
- ☆22Dec 15, 2023Updated 2 years ago
- An example repo that demonstrates how to properly test Python code that interface with Elasticsearch.☆12Aug 26, 2020Updated 5 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Winter Break Collaboratory DS Boot Camp during the academic year of 2017-2018☆14Feb 12, 2018Updated 8 years ago
- Specialization of BERT architecture both for the Spanish language and the Twitter domain☆13Nov 6, 2020Updated 5 years ago
- Goldfish: Monolingual language models for 350 languages.☆24Mar 4, 2026Updated last month
- Poetry Corpora Annotated on Aesthetic Emotions☆12Aug 2, 2022Updated 3 years ago
- Django DB Backend Interface for ArangoDB☆10Dec 12, 2022Updated 3 years ago
- A software for transferring pre-trained English models to foreign languages☆19Mar 20, 2023Updated 3 years ago
- Master thesis: Exploring bias in German NLG (GPT-3 & GerPT-2). Applies regard classification and bias mitigation triggers.☆16Sep 25, 2024Updated last year