google-research-datasets / C4_200M-synthetic-dataset-for-grammatical-error-correction

This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences from C4 using a tagged corruption model. The approach and the dataset are described in more detail by Stahlberg and Kumar (2021) (https://www.aclweb.org/anthology/2021.bea-1.4/)
159Updated 4 months ago

Alternatives and similar repositories for C4_200M-synthetic-dataset-for-grammatical-error-correction:

Users that are interested in C4_200M-synthetic-dataset-for-grammatical-error-correction are comparing it to the libraries listed below