This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.
☆23Feb 22, 2022Updated 4 years ago
Alternatives and similar repositories for german-wikipedia-text-corpus
Users that are interested in german-wikipedia-text-corpus are comparing it to the libraries listed below
Sorting:
- Wikipedia text corpus for self-supervised NLP model training☆46Jul 17, 2022Updated 3 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆38Dec 14, 2021Updated 4 years ago
- ☆16Apr 2, 2021Updated 4 years ago
- Deutsches Lyrik Korpus (DLK) / German Poetry Corpus☆19May 21, 2024Updated last year
- 🔁 Async JSON-RPC 2.0 protocol + server powered by asyncio & py35+. json-rpc successor.☆21Jul 21, 2023Updated 2 years ago
- Tools for Optuna, MLflow and the integration of both.☆17May 28, 2023Updated 2 years ago
- Plan and train German transformer models.☆23Feb 22, 2021Updated 5 years ago
- A merged version of multiple open-source German speech datasets.☆34May 3, 2024Updated last year
- A dead simple, insecure git-over-http server using nginx☆33Dec 16, 2021Updated 4 years ago
- Python code to automatically produce a summary of a piece of text.☆12Sep 8, 2016Updated 9 years ago
- A PEP 503-compliant Python package index specifically providing wheels built for Alpine Linux☆39Updated this week
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- Django DB Backend Interface for ArangoDB☆10Dec 12, 2022Updated 3 years ago
- An opinionated NLP research template☆10Aug 29, 2024Updated last year
- Collection of Twitter-related helper functions for python.☆13Feb 24, 2026Updated 2 weeks ago
- DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models☆159Dec 6, 2022Updated 3 years ago
- Winter Break Collaboratory DS Boot Camp during the academic year of 2017-2018☆14Feb 12, 2018Updated 8 years ago
- Poetry Corpora Annotated on Aesthetic Emotions☆12Aug 2, 2022Updated 3 years ago
- Tutorial and talk about the Reasonable Ontology Language at the Knowledge Graph Conference 2022.☆12May 9, 2023Updated 2 years ago
- General information about DEEP BERLIN's AI for Good Hackathon 2020☆11Apr 14, 2020Updated 5 years ago
- A rolling version of the Latent Dirichlet Allocation.☆13Nov 27, 2023Updated 2 years ago
- A database of climate change newspaper articles☆16Jan 31, 2026Updated last month
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆13Aug 10, 2023Updated 2 years ago
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 2 years ago
- ☆13Jun 2, 2021Updated 4 years ago
- 모두의 말뭉치 데이터를 분석에 편리한 형태로 변환하는 기능을 제공합니다.☆11Mar 2, 2022Updated 4 years ago
- Transformer language model (GPT-2) with sentencepiece tokenizer☆10Oct 15, 2019Updated 6 years ago
- Specialization of BERT architecture both for the Spanish language and the Twitter domain☆13Nov 6, 2020Updated 5 years ago
- A bridge between Pydantic V2 models and RDF graphs☆18May 3, 2025Updated 10 months ago
- Lowering PyTorch's Memory Consumption for Selective Differentiation☆12Aug 29, 2024Updated last year
- A DH abstracts conversion tool☆13Mar 18, 2025Updated 11 months ago
- A tool to create dependency graphs of ideas (useful for presentation or teaching)☆12Oct 31, 2024Updated last year
- ☆13Aug 13, 2020Updated 5 years ago
- Code to create the dataset from "A New Aligned Simple German Corpus☆12Jan 8, 2024Updated 2 years ago
- Codebase describing experiments in Truncation Sampling as Language Model Desmoothing☆13Dec 6, 2022Updated 3 years ago
- Python source code for EMNLP 2021 Findings paper: "Subword Mapping and Anchoring Across Languages".☆13Sep 17, 2021Updated 4 years ago
- The code for domain-robust language identification with adversarial loss☆15May 29, 2018Updated 7 years ago
- Connector with single-request transactions for Neo4j 3.0 and above☆14Jul 25, 2020Updated 5 years ago
- Poems retrieval demo built with GNES framework☆14Oct 3, 2019Updated 6 years ago