google-research-datasets/C4_200M-synthetic-dataset-for-grammatical-error-correction

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-research-datasets/C4_200M-synthetic-dataset-for-grammatical-error-correction)

google-research-datasets / C4_200M-synthetic-dataset-for-grammatical-error-correction

This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences from C4 using a tagged corruption model. The approach and the dataset are described in more detail by Stahlberg and Kumar (2021) (https://www.aclweb.org/anthology/2021.bea-1.4/)

☆163

Alternatives and similar repositories for C4_200M-synthetic-dataset-for-grammatical-error-correction

Users that are interested in C4_200M-synthetic-dataset-for-grammatical-error-correction are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

google-research-datasets / clang8
View on GitHub
cLang-8 is a dataset for grammatical error correction.
☆113Jul 19, 2022Updated 4 years ago
awasthiabhijeet / PIE
View on GitHub
Fast + Non-Autoregressive Grammatical Error Correction using BERT. Code and Pre-trained models for paper "Parallel Iterative Edit Models …
☆233Mar 24, 2023Updated 3 years ago
chrisjbryant / errant
View on GitHub
ERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.
☆466May 28, 2026Updated 2 months ago
gotutiyan / GEC-Info
View on GitHub
Repository to collect and categorize Grammatical Error Correction papers.
☆127Jan 30, 2026Updated 5 months ago
thunlp / VERNet
View on GitHub
Source codes of Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction
☆42Jul 2, 2021Updated 5 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
michiyasunaga / LM-Critic
View on GitHub
[EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction
☆118Sep 26, 2021Updated 4 years ago
kanekomasahiro / bert-gec
View on GitHub
☆120Sep 9, 2020Updated 5 years ago
grammarly / gector
View on GitHub
Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagg…
☆971May 21, 2024Updated 2 years ago
grammatical / coling2020-tutorial
View on GitHub
This repository contains materials for our tutorial on automatic grammatical error correction: R. Grundkiewicz, C. Bryant, M. Felice: A C…
☆38Dec 12, 2020Updated 5 years ago
MaksTarnavskyi / gector-large
View on GitHub
Improved version of GECToR
☆63Jul 24, 2023Updated 3 years ago
nusnlp / esc
View on GitHub
The official code of the "Frustratingly Easy System Combination for Grammatical Error Correction" paper
☆57Mar 4, 2024Updated 2 years ago
kanekomasahiro / eb-gec
View on GitHub
☆15Mar 15, 2022Updated 4 years ago
nusnlp / m2scorer
View on GitHub
MaxMatch (M^2) Scorer - Evaluation program for grammatical error correction systems.
☆156Sep 27, 2022Updated 3 years ago
AutoTemp / Shallow-Aggressive-Decoding
View on GitHub
Codes for the paper "Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding" (ACL-IJCNLP 2021)
☆41Jun 7, 2021Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
butsugiri / gec-pseudodata
View on GitHub
Repository of "An Empirical Study of Incorporating Pseudo Data into Grammatical Error Correction" (EMNLP-IJCNLP 2019)
☆68Dec 23, 2019Updated 6 years ago
SimonHFL / CWEB
View on GitHub
☆18Jan 8, 2021Updated 5 years ago
kanyun-inc / fairseq-gec
View on GitHub
Source code for paper: Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data
☆251Jun 3, 2020Updated 6 years ago
cofe-ai / fast-gector
View on GitHub
☆63Aug 2, 2023Updated 2 years ago
keisks / jfleg
View on GitHub
JFLEG (JHU FLuency-Extended GUG) corpus for Grammatical Error Correction Evaluation
☆118Jun 11, 2023Updated 3 years ago
kanekomasahiro / grammatical-error-detection
View on GitHub
☆18Sep 16, 2017Updated 8 years ago
snukky / wikiedits
View on GitHub
Automatic extraction of edited sentences from text edition histories.
☆82Feb 14, 2022Updated 4 years ago
grammarly / GMEG
View on GitHub
GMEG
☆33Nov 21, 2024Updated last year
AlexeySorokin / EditScorer
View on GitHub
The code for EMNLP2022 paper "Improved grammatical error correction by ranking elementary edits"
☆21Dec 14, 2022Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ufal / low-resource-gec-wnut2019
View on GitHub
Source code for paper Grammatical Error Correction in Low-Resource Scenarios (W-NUT 2019)
☆13Jun 21, 2022Updated 4 years ago
thiborose / gecko-app
View on GitHub
A web application that interfaces two GEC systems. [web instance is down]
☆32Aug 2, 2024Updated last year
neuspell / neuspell
View on GitHub
NeuSpell: A Neural Spelling Correction Toolkit
☆713Jul 31, 2023Updated 2 years ago
grammarly / pillars-of-gec
View on GitHub
Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Models
☆32Apr 27, 2024Updated 2 years ago
Jason3900 / M2Convertor
View on GitHub
Convert Standard M2 format to parallel sentences.
☆22Jun 20, 2020Updated 6 years ago
PrithivirajDamodaran / Gramformer
View on GitHub
A framework for detecting, highlighting and correcting grammatical errors on natural language text. Created by Prithiviraj Damodaran. Ope…
☆1,586Feb 15, 2023Updated 3 years ago
li-aolong / TemplateGEC
View on GitHub
ACL2023 (Oral): TemplateGEC: Improving Grammatical Error Correction with Detection Template
☆23Jul 10, 2023Updated 3 years ago
cehinson / ERRANT_ZH
View on GitHub
☆15Jan 21, 2021Updated 5 years ago
mfelice / imeasure
View on GitHub
☆14May 17, 2015Updated 11 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
nusnlp / neuqe
View on GitHub
Neural quality estimation toolkit for grammatical error correction and other language generation applications.
☆49Mar 19, 2019Updated 7 years ago
shuo-git / InfECE
View on GitHub
☆20Dec 31, 2020Updated 5 years ago
destwang / CTC2021
View on GitHub
☆129Nov 3, 2022Updated 3 years ago
tomo-wb / Lang8-NAIST-extractor
View on GitHub
☆30May 8, 2020Updated 6 years ago
Chunngai / gec-papers
View on GitHub
Paper list for grammatical error correction (GEC).
☆49Mar 31, 2025Updated last year
cyrilou242 / learning-lightnr
View on GitHub
Generate multiple choice fill-in-the-blank questions from any article.
☆13Dec 8, 2022Updated 3 years ago
Katsumata420 / generic-pretrained-GEC
View on GitHub
Stronger Baselines for Grammatical Error Correction Using a Pretrained Encoder-Decoder Model.
☆37Apr 6, 2023Updated 3 years ago