This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.
☆23Feb 22, 2022Updated 4 years ago
Alternatives and similar repositories for german-wikipedia-text-corpus
Users that are interested in german-wikipedia-text-corpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.☆28Dec 15, 2019Updated 6 years ago
- Wikipedia text corpus for self-supervised NLP model training☆46Jul 17, 2022Updated 3 years ago
- German Dataset for Legal Information Retrieval☆26Feb 26, 2024Updated 2 years ago
- Python port for IWNLP.Lemmatizer☆19Apr 13, 2026Updated last month
- Plan and train German transformer models.☆23Feb 22, 2021Updated 5 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- German lemmatization with IWNLP as extension for spaCy☆27Apr 13, 2026Updated last month
- Python code to automatically produce a summary of a piece of text.☆12Sep 8, 2016Updated 9 years ago
- An opinionated NLP research template☆10Aug 29, 2024Updated last year
- A small python based build file generator targetting the build system ninja☆46Dec 21, 2016Updated 9 years ago
- Python source code for EMNLP 2021 Findings paper: "Subword Mapping and Anchoring Across Languages".☆13Sep 17, 2021Updated 4 years ago
- LM pretraining for generation, reading list, resources, conference mappings.☆19Feb 25, 2020Updated 6 years ago
- Collection of Twitter-related helper functions for python.☆14Feb 24, 2026Updated 2 months ago
- DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models☆158Dec 6, 2022Updated 3 years ago
- Lowering PyTorch's Memory Consumption for Selective Differentiation☆12Aug 29, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A Smalltalk Web Browser for Squeak/Smalltalk☆18Apr 18, 2022Updated 4 years ago
- ☆16Jun 14, 2024Updated last year
- ☆14Jan 6, 2025Updated last year
- ☆10May 5, 2017Updated 9 years ago
- Transformer language model (GPT-2) with sentencepiece tokenizer☆10Oct 15, 2019Updated 6 years ago
- Poems retrieval demo built with GNES framework☆14Oct 3, 2019Updated 6 years ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆29Apr 17, 2024Updated 2 years ago
- Machine Learning Toolbox 2☆13Nov 22, 2025Updated 5 months ago
- General information about DEEP BERLIN's AI for Good Hackathon 2020☆11Apr 14, 2020Updated 6 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- TTS前,文本标准化,将数字字母处理转化为汉字☆12Apr 27, 2024Updated 2 years ago
- Tutorial and talk about the Reasonable Ontology Language at the Knowledge Graph Conference 2022.☆12May 9, 2023Updated 3 years ago
- ☆13Aug 13, 2020Updated 5 years ago
- ☆22Dec 15, 2023Updated 2 years ago
- Winter Break Collaboratory DS Boot Camp during the academic year of 2017-2018☆14Feb 12, 2018Updated 8 years ago
- Specialization of BERT architecture both for the Spanish language and the Twitter domain☆13Nov 6, 2020Updated 5 years ago
- Master thesis: Exploring bias in German NLG (GPT-3 & GerPT-2). Applies regard classification and bias mitigation triggers.☆16Sep 25, 2024Updated last year
- Easy access to administrative boundary data with python☆17Oct 4, 2022Updated 3 years ago
- A bridge between Pydantic V2 models and RDF graphs☆19May 3, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Poetry Corpora Annotated on Aesthetic Emotions☆13Aug 2, 2022Updated 3 years ago
- ☆14Jun 2, 2021Updated 4 years ago
- A really fast document ranking engine using BM25 and TF-IDF. Based on Python using NLP packages NLTK and spacY.☆17May 8, 2018Updated 8 years ago
- A dataset of semantically related sentence pairs in the German legal domain☆10Feb 26, 2021Updated 5 years ago
- A tool to create dependency graphs of ideas (useful for presentation or teaching)☆12Oct 31, 2024Updated last year
- A tokenizer and sentence splitter for German and English web and social media texts.☆153Dec 9, 2024Updated last year
- Codebase describing experiments in Truncation Sampling as Language Model Desmoothing☆13Dec 6, 2022Updated 3 years ago