google-research-datasets/wiki-split

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-research-datasets/wiki-split)

google-research-datasets / wiki-split

One million English sentences, each split into two sentences that together preserve the original meaning, extracted from Wikipedia edits.

☆125

Alternatives and similar repositories for wiki-split

Users that are interested in wiki-split are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

google-research-datasets / wiki-atomic-edits
View on GitHub
A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contai…
☆105May 6, 2019Updated 7 years ago
shashiongithub / Split-and-Rephrase
View on GitHub
The WebSplit Benchmark introducing "Split and Rephrase" task
☆62Sep 26, 2018Updated 7 years ago
google-research-datasets / discofuse
View on GitHub
☆32Jun 16, 2021Updated 5 years ago
google-research-datasets / query-wellformedness
View on GitHub
25,100 queries from the Paralex corpus (Fader et al., 2013) annotated with human ratings of whether they are well-formed natural languag…
☆85Oct 9, 2018Updated 7 years ago
allenai / allennlp-reading-comprehension-research
View on GitHub
☆41Feb 12, 2019Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
shmsw25 / bart-closed-book-qa
View on GitHub
A BART version of an open-domain QA model in a closed-book setup
☆118Aug 13, 2020Updated 5 years ago
google-research-datasets / paws
View on GitHub
This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…
☆571Jan 4, 2022Updated 4 years ago
nyu-mll / spinn
View on GitHub
NYU ML² work on sentence encoding with tree structure and dynamic graphs
☆108Dec 3, 2018Updated 7 years ago
cocoxu / simplification
View on GitHub
Text Simplification System and Dataset
☆124Jul 7, 2023Updated 3 years ago
northanapon / dict-definition
View on GitHub
Preprocessing scripts to read definitions and other information from dictionaries
☆23Nov 7, 2017Updated 8 years ago
neulab / lrlm
View on GitHub
Code for the paper "Latent Relation Language Models" at AAAI-20.
☆41Sep 22, 2025Updated 10 months ago
sgraaf / Replicate-Toronto-BookCorpus
View on GitHub
This repository contains code to replicate the no-longer publicly available Toronto BookCorpus dataset
☆49Apr 6, 2022Updated 4 years ago
facebookresearch / DME
View on GitHub
Dynamic Meta-Embeddings for Improved Sentence Representations
☆333Sep 25, 2020Updated 5 years ago
yseokchoi / SejongTree2Dependency
View on GitHub
세종 구문 분석 말뭉치의 의존 구문 구조로의 변환 도구
☆10Sep 7, 2018Updated 7 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
dbd-challenge / dbdc3
View on GitHub
☆10Aug 25, 2018Updated 7 years ago
allenai / acl2018-semantic-parsing-tutorial
View on GitHub
Materials from the ACL 2018 tutorial on neural semantic parsing
☆405Jul 17, 2018Updated 8 years ago
kanekomasahiro / eb-gec
View on GitHub
☆15Mar 15, 2022Updated 4 years ago
nyu-mll / CoLA-baselines
View on GitHub
Baselines and corpus accompanying paper Neural Network Acceptability Judgments
☆58Mar 1, 2020Updated 6 years ago
xwhan / ProQA
View on GitHub
Progressively Pretrained Dense Corpus Index for Open-Domain QA and Information Retrieval
☆43Jun 12, 2023Updated 3 years ago
harvardnlp / neural-template-gen
View on GitHub
☆266Jun 9, 2022Updated 4 years ago
eliorsulem / SAMSA
View on GitHub
Simplification Automatic evaluation Measure through Semantic Annotation
☆17Mar 11, 2019Updated 7 years ago
seominjoon / denspi
View on GitHub
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index (DenSPI)
☆200Jul 6, 2023Updated 3 years ago
miyyer / scpn
View on GitHub
syntactically controlled paraphrase networks
☆168Dec 30, 2018Updated 7 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
mbforbes / physical-commonsense
View on GitHub
Do Neural Language Representations Learn Physical Commonsense?
☆22Dec 28, 2021Updated 4 years ago
glample / fastBPE
View on GitHub
Fast BPE
☆677Jun 18, 2024Updated 2 years ago
jmhessel / multi-retrieval
View on GitHub
Code for Unsupervised Discovery of Multimodal Links in Multi-Image/Multi-Sentence Documents
☆30Jul 22, 2020Updated 6 years ago
tatHi / optok
View on GitHub
☆10Aug 26, 2021Updated 4 years ago
facebookresearch / unlikelihood_training
View on GitHub
Neural Text Generation with Unlikelihood Training
☆311Aug 31, 2021Updated 4 years ago
cooelf / Paper_Writing_Tips
View on GitHub
☆12Apr 25, 2022Updated 4 years ago
feralvam / easse
View on GitHub
Easier Automatic Sentence Simplification Evaluation
☆167Sep 25, 2023Updated 2 years ago
THU-KEG / MAVEN-Argument
View on GitHub
Completing the Puzzle of All-in-One Event Understanding Benchmark with Event Arguments
☆14Mar 12, 2024Updated 2 years ago
nttcslab-nlp / doc_lm
View on GitHub
☆11Jan 9, 2019Updated 7 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
google / active-qa
View on GitHub
☆344Dec 11, 2018Updated 7 years ago
google-research / lasertagger
View on GitHub
☆603Mar 12, 2026Updated 4 months ago
neulab / compare-mt
View on GitHub
A tool for holistic analysis of language generations systems
☆471Sep 22, 2025Updated 10 months ago
idiap / wmil-sgd
View on GitHub
Weighted multiple-instance learning algorithm based on stochastic gradient descent
☆12Feb 22, 2019Updated 7 years ago
jmzhao / bag-of-substring-embedder
View on GitHub
☆16May 8, 2020Updated 6 years ago
salesforce / cove
View on GitHub
☆471Feb 12, 2022Updated 4 years ago
nyu-dl / dl4ir-searchQA
View on GitHub
☆181Aug 17, 2018Updated 7 years ago