Wikipedia text corpus for self-supervised NLP model training
☆46Jul 17, 2022Updated 3 years ago
Alternatives and similar repositories for wikipedia2corpus
Users that are interested in wikipedia2corpus are comparing it to the libraries listed below
Sorting:
- This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings l…☆23Feb 22, 2022Updated 4 years ago
- Brave is a simple visualisation library for NLP information extraction, built on top of embedded BRAT.☆15Dec 25, 2019Updated 6 years ago
- Code for our TSD paper "TOKEN is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models"☆14Aug 19, 2022Updated 3 years ago
- This repository contains code for the paper "Uncertainty Estimation and Calibration with Finite-State Probabilistic RNNs" (Wang, Lawrence…☆17Mar 8, 2021Updated 4 years ago
- A dataset for realistic evaluation of noisy label methods☆14Dec 3, 2023Updated 2 years ago
- [ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction☆13Apr 21, 2020Updated 5 years ago
- Klexikon: A German Dataset for Joint Summarization and Simplification☆17Oct 5, 2022Updated 3 years ago
- X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents (JCDL 2022)☆14Jul 22, 2022Updated 3 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆38Dec 14, 2021Updated 4 years ago
- Tools relating to the CC-News-En Collection☆20Dec 8, 2023Updated 2 years ago
- German Alpaca Dataset (Cleaned + Translated)☆26Apr 6, 2023Updated 2 years ago
- Repo for the simplified text alignment tools.☆21Dec 4, 2020Updated 5 years ago
- ☆17Feb 1, 2023Updated 3 years ago
- DWIE (Deutsche Welle corpus for Information Extraction) dataset. Introduced in our "DWIE: an entity-centric dataset for multi-task docume…☆52Jul 23, 2023Updated 2 years ago
- Alignment and annotation for comparable documents.☆22Oct 16, 2018Updated 7 years ago
- ☆24Jun 12, 2023Updated 2 years ago
- Plan and train German transformer models.☆23Feb 22, 2021Updated 5 years ago
- This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.☆28Dec 15, 2019Updated 6 years ago
- The Art and Science of Empirical Computer Science (Fall 2022)☆21Sep 1, 2023Updated 2 years ago
- Open Source / ENTSUM: A Data Set for Entity-Centric Extractive Summarization☆29May 23, 2022Updated 3 years ago
- Official code repository for the main conference paper in EMNLP 2022: SubeventWriter: Iterative Sub-event Sequence Generation with Cohere…☆11Oct 16, 2022Updated 3 years ago
- A first cut into exploring the use of dependency links for building Text Graphs, that, among other things, with help of a centrality algo…☆32Oct 20, 2023Updated 2 years ago
- Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern stri…☆35Jul 7, 2022Updated 3 years ago
- AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external w…☆31Jan 14, 2023Updated 3 years ago
- Source code of the paper "Do Syntax Trees Help Pre-trained Transformers Extract Information?" (EACL 2021)☆75Dec 29, 2021Updated 4 years ago
- EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling☆34Nov 21, 2021Updated 4 years ago
- Comprehensive evaluation framework for Open Information Extraction.☆40Jun 21, 2022Updated 3 years ago
- Legal Reference Extraction☆43Feb 13, 2026Updated 3 weeks ago
- Wikidata Live Changes - Group Project - 2020☆10Apr 23, 2024Updated last year
- mReasoner is a unified computational implementation of the model theory of thinking and reasoning☆13Aug 17, 2023Updated 2 years ago
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- ☆40May 4, 2024Updated last year
- This is sound source definition file of Domino for Roland SD-80.☆10May 1, 2016Updated 9 years ago
- Fake NEWS detector using LIAR dataset.☆11Aug 19, 2019Updated 6 years ago
- Wikimedia Enterprise - client SDK in Python☆20Nov 11, 2025Updated 3 months ago
- Python package for Geometric / Clifford Algebra with Pytorch.☆14Jan 25, 2026Updated last month
- An open source 3d slide presentation for the Godot Engine☆11Aug 3, 2017Updated 8 years ago
- XWikisCorpus, cross-lingual summarisation, multi-lingual summarisation, pre-trained language models, zero-shot and few-shot summarisation…☆10Nov 4, 2022Updated 3 years ago
- Midi2PLAY is an application that helps the process of converting MIDI files (.mid) making them compatible with the syntax accepted by the…☆10Dec 30, 2021Updated 4 years ago