google-research-datasets/wiki-atomic-edits

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-research-datasets/wiki-atomic-edits)

google-research-datasets / wiki-atomic-edits

A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.

☆105

Alternatives and similar repositories for wiki-atomic-edits

Users that are interested in wiki-atomic-edits are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

google-research-datasets / wiki-split
View on GitHub
One million English sentences, each split into two sentences that together preserve the original meaning, extracted from Wikipedia edits.
☆125Jun 3, 2019Updated 7 years ago
diyiy / Wiki_Semantic_Intention
View on GitHub
Predict edit intentions on Wikipedia
☆19Jan 24, 2019Updated 7 years ago
neubig / howtocode-2017
View on GitHub
An example of DyNet autobatching for the NIPS "how to code a paper" workshop
☆12Dec 9, 2017Updated 8 years ago
google-research-datasets / discofuse
View on GitHub
☆32Jun 16, 2021Updated 5 years ago
xwhan / ProQA
View on GitHub
Progressively Pretrained Dense Corpus Index for Open-Domain QA and Information Retrieval
☆43Jun 12, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
carolinlawrence / gradient-rollback
View on GitHub
Code for gradient rollback, which explains predictions of neural matrix factorization models, as for example used for knowledge base comp…
☆21Mar 16, 2021Updated 5 years ago
google-research-datasets / query-wellformedness
View on GitHub
25,100 queries from the Paralex corpus (Fader et al., 2013) annotated with human ratings of whether they are well-formed natural languag…
☆85Oct 9, 2018Updated 7 years ago
adrianeboyd / boyd-wnut2018
View on GitHub
Code and data for: Low Resource Grammatical Error Correction Using Wikipedia Edits (WNUT 2018)
☆17Jul 16, 2024Updated 2 years ago
velocityCavalry / CREPE
View on GitHub
An original implementation of the paper "CREPE: Open-Domain Question Answering with False Presuppositions"
☆16Nov 5, 2024Updated last year
sai-prasanna / lmproof
View on GitHub
Language model powered proof reader for correcting contextual errors in natural language.
☆24Jul 6, 2023Updated 3 years ago
guillaume-be / SentencePiece-Rust-example
View on GitHub
Supporting example for "A Rust SentencePiece implementation"
☆20Jun 7, 2020Updated 6 years ago
simonepri / fever-transformers
View on GitHub
📄 Evidence Retrieval and Claim Verification for the FEVER shared task using Transformer Networks
☆12Feb 21, 2020Updated 6 years ago
shashiongithub / Rainbow-Parser
View on GitHub
The Rainbow Parser
☆17Mar 5, 2018Updated 8 years ago
xiamengzhou / NLPerf
View on GitHub
Performance Prediction for NLP Tasks
☆17May 5, 2020Updated 6 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
mtszkw / fast-torch
View on GitHub
Comparing PyTorch, JIT and ONNX for inference with Transformers
☆19Feb 22, 2021Updated 5 years ago
clab / cnn-v1
View on GitHub
Legacy version of CNN neural net toolkit (now called dynet)
☆19Oct 8, 2016Updated 9 years ago
nec-research / st_tau
View on GitHub
This repository contains code for the paper "Uncertainty Estimation and Calibration with Finite-State Probabilistic RNNs" (Wang, Lawrence…
☆17Mar 8, 2021Updated 5 years ago
mjstrobl / WEXEA
View on GitHub
Wikipedia EXhaustive Entity Annotator (LREC 2020)
☆16Apr 22, 2024Updated 2 years ago
google-research-datasets / paws
View on GitHub
This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…
☆571Jan 4, 2022Updated 4 years ago
cyrilou242 / learning-lightnr
View on GitHub
Generate multiple choice fill-in-the-blank questions from any article.
☆13Dec 8, 2022Updated 3 years ago
gkiril / MinSCIE
View on GitHub
MinScIE is an Open Information Extraction system which provides structured knowledge enriched with semantic information about citations.
☆15Jun 9, 2019Updated 7 years ago
xlhex / dpe
View on GitHub
☆22Oct 26, 2020Updated 5 years ago
allenai / sledgehammer
View on GitHub
☆48Jun 8, 2020Updated 6 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
harvardnlp / cascaded-generation
View on GitHub
Cascaded Text Generation with Markov Transformers
☆130Mar 20, 2023Updated 3 years ago
tomhosking / torchseq
View on GitHub
Pytorch Seq2Seq framework
☆27Feb 18, 2026Updated 5 months ago
shrimai / Towards-Content-Transfer-through-Grounded-Text-Generation
View on GitHub
☆33May 15, 2019Updated 7 years ago
marcotcr / sears
View on GitHub
Code for "Semantically Equivalent Adversarial Rules for Debugging NLP Models"
☆88Oct 16, 2018Updated 7 years ago
raosudha89 / GYAFC-corpus
View on GitHub
This is the Grammarly's Yahoo Answers Formality Corpus
☆108Jul 7, 2025Updated last year
miyyer / scpn
View on GitHub
syntactically controlled paraphrase networks
☆168Dec 30, 2018Updated 7 years ago
rloganiv / kglm-data
View on GitHub
Code used to create the Linked WikiText-2 dataset
☆16May 22, 2023Updated 3 years ago
mrqa / MRQA-Shared-Task-2019
View on GitHub
Resources for the MRQA 2019 Shared Task
☆294Aug 5, 2021Updated 4 years ago
kakaobrain / helo-word
View on GitHub
Team Kakao&Brain's Grammatical Error Correction System for the ACL 2019 BEA Shared Task
☆93Sep 19, 2019Updated 6 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
tunib-ai / artwork_captions
View on GitHub
Machine Generated Captions for Best Artworks
☆22Sep 21, 2022Updated 3 years ago
LittleYUYU / Interactive-Semantic-Parsing
View on GitHub
Interactive Semantic Parsing for If-Then Recipes via Hierarchical Reinforcement Learning (AAAI'19)
☆28Oct 31, 2019Updated 6 years ago
grammatical / pretraining-bea2019
View on GitHub
Models, system configurations and outputs of our winning GEC systems in the BEA 2019 shared task described in R. Grundkiewicz, M. Junczys…
☆52Oct 22, 2019Updated 6 years ago
vid-koci / KBCtransferlearning
View on GitHub
Code accompanying the paper "Knowledge Base Completion Meets Transfer Learning"
☆15Feb 21, 2024Updated 2 years ago
nec-research / KGEval
View on GitHub
A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.
☆15Aug 3, 2022Updated 3 years ago
grammatical / magec-wnut2019
View on GitHub
Models and training scripts for the English, German and Russian MAGEC systems described in R. Grundkiewicz, M. Junczys-Dowmunt: Minimally…
☆12Jul 7, 2021Updated 5 years ago
timvieira / rl
View on GitHub
Reference implementation of algorithms for reinforcement learning and Markov decision processes.
☆12Jan 28, 2021Updated 5 years ago