This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.
☆23Feb 22, 2022Updated 4 years ago
Alternatives and similar repositories for german-wikipedia-text-corpus
Users that are interested in german-wikipedia-text-corpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.☆28Dec 15, 2019Updated 6 years ago
- Wikipedia text corpus for self-supervised NLP model training☆46Jul 17, 2022Updated 3 years ago
- Tools for Optuna, MLflow and the integration of both.☆17May 28, 2023Updated 2 years ago
- German Dataset for Legal Information Retrieval☆25Feb 26, 2024Updated 2 years ago
- 🔁 Async JSON-RPC 2.0 protocol + server powered by asyncio & py35+. json-rpc successor.☆22Jul 21, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Deutsches Lyrik Korpus (DLK) / German Poetry Corpus☆19May 21, 2024Updated last year
- German lemmatization with IWNLP as extension for spaCy☆27Jul 28, 2023Updated 2 years ago
- Python code to automatically produce a summary of a piece of text.☆12Sep 8, 2016Updated 9 years ago
- dnsmasq docker image, fully configurable through ENV☆32Mar 31, 2026Updated last week
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the dat…☆168Dec 29, 2024Updated last year
- A merged version of multiple open-source German speech datasets.☆34May 3, 2024Updated last year
- A small python based build file generator targetting the build system ninja☆46Dec 21, 2016Updated 9 years ago
- A simple BNF parser generator for Python. Note: in development!☆27Mar 6, 2018Updated 8 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- LM pretraining for generation, reading list, resources, conference mappings.☆20Feb 25, 2020Updated 6 years ago
- Collection of Twitter-related helper functions for python.☆14Feb 24, 2026Updated last month
- DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models☆159Dec 6, 2022Updated 3 years ago
- Lowering PyTorch's Memory Consumption for Selective Differentiation☆12Aug 29, 2024Updated last year
- A semantic versioning library for Python☆46May 11, 2022Updated 3 years ago
- ☆10May 5, 2017Updated 8 years ago
- Transformer language model (GPT-2) with sentencepiece tokenizer☆10Oct 15, 2019Updated 6 years ago
- Poems retrieval demo built with GNES framework☆14Oct 3, 2019Updated 6 years ago
- General information about DEEP BERLIN's AI for Good Hackathon 2020☆11Apr 14, 2020Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- TTS前,文本标准化,将数字字母处理转化为汉字☆12Apr 27, 2024Updated last year
- Replication package for "Fine-grained prediction of food crises from news streams"☆10Jun 27, 2023Updated 2 years ago
- Tutorial and talk about the Reasonable Ontology Language at the Knowledge Graph Conference 2022.☆12May 9, 2023Updated 2 years ago
- ☆13Aug 13, 2020Updated 5 years ago
- Winter Break Collaboratory DS Boot Camp during the academic year of 2017-2018☆14Feb 12, 2018Updated 8 years ago
- Specialization of BERT architecture both for the Spanish language and the Twitter domain☆13Nov 6, 2020Updated 5 years ago
- Goldfish: Monolingual language models for 350 languages.☆25Mar 4, 2026Updated last month
- A software for transferring pre-trained English models to foreign languages☆19Mar 20, 2023Updated 3 years ago
- Master thesis: Exploring bias in German NLG (GPT-3 & GerPT-2). Applies regard classification and bias mitigation triggers.☆16Sep 25, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Easy access to administrative boundary data with python☆17Oct 4, 2022Updated 3 years ago
- A bridge between Pydantic V2 models and RDF graphs☆18May 3, 2025Updated 11 months ago
- A dataset of semantically related sentence pairs in the German legal domain☆10Feb 26, 2021Updated 5 years ago
- A tool to create dependency graphs of ideas (useful for presentation or teaching)☆12Oct 31, 2024Updated last year
- A tokenizer and sentence splitter for German and English web and social media texts.☆153Dec 9, 2024Updated last year
- Codebase describing experiments in Truncation Sampling as Language Model Desmoothing☆13Dec 6, 2022Updated 3 years ago
- Open Source Neural Machine Translation in PyTorch☆13Apr 29, 2023Updated 2 years ago