mhagiwara / github-typo-corpusLinks

GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors

☆515

Alternatives and similar repositories for github-typo-corpus

Users that are interested in github-typo-corpus are comparing it to the libraries listed below

Sorting:

yannvgn / laserembeddings
LASER multilingual sentence embeddings as a pip package
☆225Updated 2 years ago
mhagiwara / xfspell
xfspell — the Transformer Spell Checker
☆190Updated 5 years ago
TakeLab / spacy-udpipe
spaCy + UDPipe
☆163Updated 3 years ago
notAI-tech / deepsegment
A sentence segmenter that actually works!
☆304Updated 5 years ago
explosion / spacy-stanza
💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy
☆739Updated last year
dair-ai / nlp_newsletter
📰Natural language processing (NLP) newsletter
☆302Updated 5 years ago
dbamman / litbank
Annotated dataset of 100 works of fiction to support tasks in natural language processing and the computational humanities.
☆365Updated 2 years ago
google-research-datasets / paws
This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…
☆560Updated 3 years ago
simonepri / lm-scorer
📃Language Model based sentences scoring library
☆308Updated 3 years ago
adobe / NLP-Cube
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
☆562Updated last year
chakki-works / sumeval
Well tested & Multi-language evaluation framework for text summarization.
☆626Updated 3 years ago
kakaobrain / word2word
Easy-to-use word-to-word translations for 3,564 language pairs.
☆367Updated 4 years ago
bjascob / LemmInflect
A python module for English lemmatization and inflection.
☆274Updated 2 years ago
EmilStenstrom / conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
☆318Updated 3 months ago
facebookresearch / moe
Misspelling Oblivious Word Embeddings
☆201Updated 6 years ago
chrisjbryant / errant
ERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.
☆454Updated last year
fnl / syntok
Text tokenization and sentence segmentation (segtok v2)
☆207Updated 3 years ago
robustness-gym / summvis
SummVis is an interactive visualization tool for text summarization.
☆253Updated 3 years ago
explosion / tokenizations
Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/
☆193Updated 2 years ago
neulab / compare-mt
A tool for holistic analysis of language generations systems
☆471Updated 2 months ago
R1j1t / contextualSpellCheck
✔️Contextual word checker for better suggestions (not actively maintained)
☆418Updated 9 months ago
nreimers / truecaser
Language independent truecaser in Python.
☆160Updated 4 years ago
argilla-io / spacy-wordnet
spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface
☆261Updated 3 months ago
bitextor / bitextor
Bitextor generates translation memories from multilingual websites
☆296Updated last year
facebookresearch / MLQA
New dataset
☆309Updated 4 years ago
facebookresearch / vizseq
An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)
☆446Updated this week
google-research-datasets / tydiqa
TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and …
☆317Updated 5 years ago
Hyperparticle / udify
A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology…
☆223Updated 2 years ago
facebookresearch / ELI5
Scripts and links to recreate the ELI5 dataset.
☆326Updated 4 years ago
Unbabel / OpenKiwi
Open-Source Machine Translation Quality Estimation in PyTorch
☆231Updated 3 years ago