This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings like fastText or ELMo Deep contextualized word representations.
☆23Feb 22, 2022Updated 4 years ago
Alternatives and similar repositories for german-wikipedia-text-corpus
Users that are interested in german-wikipedia-text-corpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.☆28Dec 15, 2019Updated 6 years ago
- Wikipedia text corpus for self-supervised NLP model training☆47Jul 17, 2022Updated 3 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆39Dec 14, 2021Updated 4 years ago
- German Dataset for Legal Information Retrieval☆27Feb 26, 2024Updated 2 years ago
- 모두의 말뭉치 데이터를 분석에 편리한 형태로 변환하는 기능을 제공합니다.☆11Mar 2, 2022Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 🔁 Async JSON-RPC 2.0 protocol + server powered by asyncio & py35+. json-rpc successor.☆22Jul 21, 2023Updated 2 years ago
- ☆16Apr 2, 2021Updated 5 years ago
- Plan and train German transformer models.☆23Feb 22, 2021Updated 5 years ago
- Deutsches Lyrik Korpus (DLK) / German Poetry Corpus☆20May 21, 2024Updated 2 years ago
- Python code to automatically produce a summary of a piece of text.☆11Sep 8, 2016Updated 9 years ago
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 3 years ago
- ☆12Mar 15, 2024Updated 2 years ago
- An opinionated NLP research template☆10Aug 29, 2024Updated last year
- An SDK and Library that is used in several Deutsche Telekom mobile apps☆12Sep 23, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- A small python based build file generator targetting the build system ninja☆46Dec 21, 2016Updated 9 years ago
- A simple BNF parser generator for Python. Note: in development!☆27Mar 6, 2018Updated 8 years ago
- Python source code for EMNLP 2021 Findings paper: "Subword Mapping and Anchoring Across Languages".☆13Sep 17, 2021Updated 4 years ago
- LM pretraining for generation, reading list, resources, conference mappings.☆19Feb 25, 2020Updated 6 years ago
- Collection of Twitter-related helper functions for python.☆14Feb 24, 2026Updated 4 months ago
- DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models☆158Dec 6, 2022Updated 3 years ago
- Extract data from German Wiktionary XML files.☆26May 29, 2026Updated last month
- Lowering PyTorch's Memory Consumption for Selective Differentiation☆12Aug 29, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A semantic versioning library for Python☆46May 11, 2022Updated 4 years ago
- A Smalltalk Web Browser for Squeak/Smalltalk☆18Apr 18, 2022Updated 4 years ago
- A function invocation framework for Python☆11Feb 21, 2024Updated 2 years ago
- ☆16Jun 14, 2024Updated 2 years ago
- ☆10May 5, 2017Updated 9 years ago
- A database of climate change newspaper articles☆16Jan 31, 2026Updated 5 months ago
- Poems retrieval demo built with GNES framework☆14Oct 3, 2019Updated 6 years ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆29Apr 17, 2024Updated 2 years ago
- Machine Learning Toolbox 2☆13Nov 22, 2025Updated 7 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- General information about DEEP BERLIN's AI for Good Hackathon 2020☆11Apr 14, 2020Updated 6 years ago
- code for "Deep Learning for Sequential Recommendation: Algorithms, Influential Factors, and Evaluations"☆12Sep 7, 2020Updated 5 years ago
- TTS前,文本标准化,将数字字母处理转化为汉字☆12Apr 27, 2024Updated 2 years ago
- ☆13Aug 13, 2020Updated 5 years ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆13Aug 10, 2023Updated 2 years ago
- An example repo that demonstrates how to properly test Python code that interface with Elasticsearch.☆12Aug 26, 2020Updated 5 years ago
- Winter Break Collaboratory DS Boot Camp during the academic year of 2017-2018☆14Feb 12, 2018Updated 8 years ago