tscheepers / Wikipedia-Summary-Dataset
This dataset contains all titles and summaries (or introductions) of English Wikipedia articles, extracted in september of 2017. It could be useful if one wants to use the smaller, more concise, and more definitional summaries in their research. Or if one just wants to use a smaller but still diverse dataset for efficient training with resource …
☆55Updated 6 years ago
Related projects ⓘ
Alternatives and complementary repositories for Wikipedia-Summary-Dataset
- Unsupervised sentence summarization by contextual matching☆47Updated 2 years ago
- This repo contains the code for our paper "EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit E…☆57Updated 4 years ago
- Pre-trained models and code and data to train and use models from "Pushing the Limits of Paraphrastic Sentence Embeddings with Millions o…☆102Updated 11 months ago
- This repository contains the data and code for the paper "An Empirical Comparison on Imitation Learning and Reinforcement Learning for Pa…☆81Updated 4 years ago
- One million English sentences, each split into two sentences that together preserve the original meaning, extracted from Wikipedia edits.☆124Updated 5 years ago
- Text Simplification System and Dataset☆123Updated last year
- ☆34Updated 3 years ago
- Multi-stage passage ranking: monoBERT + duoBERT☆112Updated 4 years ago
- This dataset contains naturally-occurring English sentences that feature non-trivial noun-verb ambiguity.☆35Updated 5 years ago
- This repository contains the code for the Form-Context Model and its Attentive Mimicking variant.☆31Updated 4 years ago
- Repository for KPTimes corpus☆34Updated 2 years ago
- ☆42Updated 5 years ago
- Use BERT to Fill in the Blanks☆82Updated 2 years ago
- ACL 2020 Unsupervised Opinion Summarization as Copycat-Review Generation☆100Updated last year
- This is the reference implementation of commonly used coreference metrics.☆74Updated 6 years ago
- ☆33Updated 6 years ago
- Contains data/code for the paper "Neural Syntactic Preordering for Controlled Paraphrase Generation" (ACL 2020).☆76Updated 3 months ago
- ☆29Updated 5 years ago
- Full Python implementation of the ROUGE metric, producing same results as in the official perl implementation.☆157Updated 5 years ago
- Cross-Lingual Alignment of Contextual Word Embeddings☆98Updated 4 years ago
- NAACL 2019: Submodular optimization-based diverse paraphrasing and its effectiveness in data augmentation☆70Updated 8 months ago
- Codebase for probing and visualizing multilingual models.☆45Updated 4 years ago
- We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scal…☆81Updated 3 years ago
- Generalizing Natural Language Analysis through Span-relation Representations☆90Updated last year
- Data and code for Kang et al., EMNLP 2019's paper titled "(Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Ann…☆29Updated 4 years ago
- pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference☆61Updated last year
- XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning☆98Updated 3 years ago
- ☆46Updated 5 years ago
- A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations☆54Updated 2 years ago
- An extension of word2vec to learn phrase embeddings☆73Updated 6 years ago