Wikipedia text corpus for self-supervised NLP model training
☆46Jul 17, 2022Updated 3 years ago
Alternatives and similar repositories for wikipedia2corpus
Users that are interested in wikipedia2corpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings l…☆23Feb 22, 2022Updated 4 years ago
- A data set and model for german sentiment classification.☆69May 30, 2025Updated 11 months ago
- German Alpaca Dataset (Cleaned + Translated)☆26Apr 6, 2023Updated 3 years ago
- Brave is a simple visualisation library for NLP information extraction, built on top of embedded BRAT.☆15Dec 25, 2019Updated 6 years ago
- Alignment and annotation for comparable documents.☆22Oct 16, 2018Updated 7 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.☆28Dec 15, 2019Updated 6 years ago
- Code for our TSD paper "TOKEN is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models"☆14Aug 19, 2022Updated 3 years ago
- [ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction☆13Apr 21, 2020Updated 6 years ago
- Codebase, data and models for the Re-Thinking the Shuffle Test paper at ACL2021☆10Oct 14, 2022Updated 3 years ago
- Crosswords puzzle generator and publisher using Constraints Satisfaction Problem (CSP) technique. With minimal backtracks.☆19Mar 29, 2019Updated 7 years ago
- Plan and train German transformer models.☆23Feb 22, 2021Updated 5 years ago
- Mirror of Apache OpenNLP Add-ons☆19Apr 14, 2026Updated 3 weeks ago
- Polish data.☆13Apr 22, 2026Updated 2 weeks ago
- ☆18Feb 1, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆12Oct 17, 2022Updated 3 years ago
- Python code to automatically produce a summary of a piece of text.☆12Sep 8, 2016Updated 9 years ago
- ☆14Mar 31, 2024Updated 2 years ago
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 3 years ago
- Legal Reference Extraction☆46Apr 22, 2026Updated 2 weeks ago
- ☆10Mar 29, 2021Updated 5 years ago
- ML pipeline and web app for classifying disaster response messages.☆10Oct 6, 2018Updated 7 years ago
- ☆12Apr 29, 2024Updated 2 years ago
- German stopwords collection☆88Oct 6, 2022Updated 3 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Python project to fetch twitter data for some interesting analyses☆13Dec 7, 2020Updated 5 years ago
- Pytorch implementation of Google TCAV☆10Jan 11, 2019Updated 7 years ago
- This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and appl…☆20Jun 5, 2025Updated 11 months ago
- ☆17Nov 23, 2021Updated 4 years ago
- Curriculum training☆22Jun 25, 2025Updated 10 months ago
- process your massive word2vec binary model file as a readable stream of records☆11Jan 28, 2018Updated 8 years ago
- ☆24Jun 12, 2023Updated 2 years ago
- A small and fast S3 client without the clutter.☆40Apr 28, 2026Updated last week
- XWikisCorpus, cross-lingual summarisation, multi-lingual summarisation, pre-trained language models, zero-shot and few-shot summarisation…☆10Nov 4, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Data for discourse connective prediction.☆12May 3, 2018Updated 8 years ago
- SMiLER - Samsung MultiLingual Entity and Relation Extraction dataset☆18Feb 11, 2021Updated 5 years ago
- Building an effective preprocessing tool for African languages☆12Jan 24, 2024Updated 2 years ago
- The NLPStatTest project☆12Mar 12, 2022Updated 4 years ago
- Poems retrieval demo built with GNES framework☆14Oct 3, 2019Updated 6 years ago
- ☆11Dec 8, 2022Updated 3 years ago
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago