This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.
☆23Feb 22, 2022Updated 4 years ago
Alternatives and similar repositories for german-wikipedia-text-corpus
Users that are interested in german-wikipedia-text-corpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.☆28Dec 15, 2019Updated 6 years ago
- Wikipedia text corpus for self-supervised NLP model training☆47Jul 17, 2022Updated 3 years ago
- German Dataset for Legal Information Retrieval☆26Feb 26, 2024Updated 2 years ago
- 모두의 말뭉치 데이터를 분석에 편리한 형태로 변환하는 기능을 제공합니다.☆11Mar 2, 2022Updated 4 years ago
- Python port for IWNLP.Lemmatizer☆19Apr 13, 2026Updated last month
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- 🔁 Async JSON-RPC 2.0 protocol + server powered by asyncio & py35+. json-rpc successor.☆22Jul 21, 2023Updated 2 years ago
- ☆16Apr 2, 2021Updated 5 years ago
- Plan and train German transformer models.☆23Feb 22, 2021Updated 5 years ago
- German lemmatization with IWNLP as extension for spaCy☆27Apr 13, 2026Updated last month
- ☆12Mar 15, 2024Updated 2 years ago
- dnsmasq docker image, fully configurable through ENV☆32May 19, 2026Updated 3 weeks ago
- An opinionated NLP research template☆10Aug 29, 2024Updated last year
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- A merged version of multiple open-source German speech datasets.☆34May 3, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A small python based build file generator targetting the build system ninja☆46Dec 21, 2016Updated 9 years ago
- A simple BNF parser generator for Python. Note: in development!☆27Mar 6, 2018Updated 8 years ago
- Python source code for EMNLP 2021 Findings paper: "Subword Mapping and Anchoring Across Languages".☆13Sep 17, 2021Updated 4 years ago
- Collection of Twitter-related helper functions for python.☆14Feb 24, 2026Updated 3 months ago
- DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models☆158Dec 6, 2022Updated 3 years ago
- A Smalltalk Web Browser for Squeak/Smalltalk☆18Apr 18, 2022Updated 4 years ago
- ☆10May 5, 2017Updated 9 years ago
- Transformer language model (GPT-2) with sentencepiece tokenizer☆10Oct 15, 2019Updated 6 years ago
- A database of climate change newspaper articles☆16Jan 31, 2026Updated 4 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆29Apr 17, 2024Updated 2 years ago
- Tutorial and talk about the Reasonable Ontology Language at the Knowledge Graph Conference 2022.☆12May 9, 2023Updated 3 years ago
- ☆13Aug 13, 2020Updated 5 years ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆13Aug 10, 2023Updated 2 years ago
- ☆22Dec 15, 2023Updated 2 years ago
- An example repo that demonstrates how to properly test Python code that interface with Elasticsearch.☆12Aug 26, 2020Updated 5 years ago
- A rolling version of the Latent Dirichlet Allocation.☆13Nov 27, 2023Updated 2 years ago
- Winter Break Collaboratory DS Boot Camp during the academic year of 2017-2018☆14Feb 12, 2018Updated 8 years ago
- Specialization of BERT architecture both for the Spanish language and the Twitter domain☆13Nov 6, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Django DB Backend Interface for ArangoDB☆10Dec 12, 2022Updated 3 years ago
- A software for transferring pre-trained English models to foreign languages☆19Mar 20, 2023Updated 3 years ago
- Master thesis: Exploring bias in German NLG (GPT-3 & GerPT-2). Applies regard classification and bias mitigation triggers.☆16Sep 25, 2024Updated last year
- Easy access to administrative boundary data with python☆17Oct 4, 2022Updated 3 years ago
- Poetry Corpora Annotated on Aesthetic Emotions☆13Aug 2, 2022Updated 3 years ago
- ☆14Jun 2, 2021Updated 5 years ago
- A really fast document ranking engine using BM25 and TF-IDF. Based on Python using NLP packages NLTK and spacY.☆17May 8, 2018Updated 8 years ago