daltonfury42 / truecase
A python true casing utility that restores case information for texts
☆88Updated last year
Related projects: ⓘ
- Language independent truecaser in Python.☆161Updated 2 years ago
- Pre-trained models and code and data to train and use models from "Pushing the Limits of Paraphrastic Sentence Embeddings with Millions o…☆101Updated 9 months ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆63Updated last year
- Automatic extraction of edited sentences from text edition histories.☆80Updated 2 years ago
- This is the reference implementation of commonly used coreference metrics.☆74Updated 6 years ago
- SacreROUGE is a library dedicated to the use and development of text generation evaluation metrics with an emphasis on summarization.☆135Updated last year
- Easier Automatic Sentence Simplification Evaluation☆157Updated 11 months ago
- Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/☆181Updated 11 months ago
- GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge (EMNLP 2019)☆91Updated last year
- Multilingual abstractive summarization dataset extracted from WikiHow.☆80Updated 3 years ago
- JFLEG (JHU FLuency-Extended GUG) corpus for Grammatical Error Correction Evaluation☆112Updated last year
- A Word Sense Disambiguation system integrating implicit and explicit external knowledge.☆66Updated 3 years ago
- OpusFilter - Parallel corpus processing toolkit☆101Updated last month
- ☆36Updated 2 years ago
- STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)☆63Updated last year
- SUPERT: Unsupervised multi-document summarization evaluation & generation☆91Updated last year
- Transformer based translation quality estimation☆106Updated last year
- A python module for word inflections designed for use with spaCy.☆90Updated 4 years ago
- Repository with code for MaChAmp: https://aclanthology.org/2021.eacl-demos.22/☆80Updated 3 weeks ago
- We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scal…☆80Updated 3 years ago
- Tools for extracting parallel corpora from article titles across languages in Wikipedia☆72Updated 9 years ago
- Text Simplification System and Dataset☆123Updated last year
- [EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction☆118Updated 2 years ago
- Coreference Resolution With Entity Equalization☆40Updated last year
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Do…☆72Updated 2 months ago
- A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations☆54Updated 2 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆148Updated 3 months ago
- LASER multilingual sentence embeddings as a pip package☆224Updated last year
- A tool that locates, downloads, and extracts machine translation corpora☆145Updated 3 months ago
- Efficient Low-Memory Aligner☆135Updated 2 weeks ago