Wikipedia text corpus for self-supervised NLP model training
☆46Jul 17, 2022Updated 3 years ago
Alternatives and similar repositories for wikipedia2corpus
Users that are interested in wikipedia2corpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings l…☆23Feb 22, 2022Updated 4 years ago
- A data set and model for german sentiment classification.☆69May 30, 2025Updated 11 months ago
- Brave is a simple visualisation library for NLP information extraction, built on top of embedded BRAT.☆15Dec 25, 2019Updated 6 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆39Dec 14, 2021Updated 4 years ago
- X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents (JCDL 2022)☆14Jul 22, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆30Jan 25, 2023Updated 3 years ago
- Alignment and annotation for comparable documents.☆22Oct 16, 2018Updated 7 years ago
- Code for our TSD paper "TOKEN is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models"☆14Aug 19, 2022Updated 3 years ago
- [ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction☆13Apr 21, 2020Updated 6 years ago
- Codebase, data and models for the Re-Thinking the Shuffle Test paper at ACL2021☆10Oct 14, 2022Updated 3 years ago
- Plan and train German transformer models.☆23Feb 22, 2021Updated 5 years ago
- Mirror of Apache OpenNLP Add-ons☆19May 18, 2026Updated last week
- Polish data.☆13May 6, 2026Updated 2 weeks ago
- ☆18Feb 1, 2023Updated 3 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- ☆12Oct 17, 2022Updated 3 years ago
- ☆14Mar 31, 2024Updated 2 years ago
- MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer☆40Jun 7, 2022Updated 3 years ago
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 3 years ago
- Legal Reference Extraction☆47May 12, 2026Updated 2 weeks ago
- Tools relating to the CC-News-En Collection☆20Dec 8, 2023Updated 2 years ago
- ☆12Oct 2, 2022Updated 3 years ago
- ☆10Mar 29, 2021Updated 5 years ago
- ☆12Apr 29, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- An SDK and Library that is used in several Deutsche Telekom mobile apps☆12Sep 23, 2024Updated last year
- German stopwords collection☆108Oct 6, 2022Updated 3 years ago
- Curriculum training☆22Jun 25, 2025Updated 11 months ago
- Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern stri…☆35Jul 7, 2022Updated 3 years ago
- This repo is meant to serve as a guide for Machine Learning/AI technical interviews.☆11Mar 5, 2024Updated 2 years ago
- Official code repository for the main conference paper in EMNLP 2022: SubeventWriter: Iterative Sub-event Sequence Generation with Cohere…☆11Oct 16, 2022Updated 3 years ago
- XWikisCorpus, cross-lingual summarisation, multi-lingual summarisation, pre-trained language models, zero-shot and few-shot summarisation…☆10Nov 4, 2022Updated 3 years ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆45Aug 10, 2024Updated last year
- GraphOfDocs: Representing multiple documents as a single graph☆21Jun 22, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Data for discourse connective prediction.☆12May 3, 2018Updated 8 years ago
- SMiLER - Samsung MultiLingual Entity and Relation Extraction dataset☆18Feb 11, 2021Updated 5 years ago
- ☆11Dec 8, 2022Updated 3 years ago
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- Breaks a word into syllables using an LSTM-based neural network.☆20Aug 14, 2023Updated 2 years ago
- GERNERMED is the first open neural NER model for medical entities designed for German data.☆18Oct 20, 2023Updated 2 years ago
- Source code of the paper "Do Syntax Trees Help Pre-trained Transformers Extract Information?" (EACL 2021)☆75Dec 29, 2021Updated 4 years ago