ad-freiburg / tokenization-repair
Correction of spaces with character-based neural language models.
☆13Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for tokenization-repair
- Repository for Findings of EMNLP 2020 "Context-aware Stand-alone Neural Spelling Correction"☆18Updated 3 years ago
- Many Natural Language Processing tasks rely on sentence boundary detection (SBD). Although amazing libraries like spacy provide state of …☆61Updated 4 years ago
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For inst…☆21Updated 2 years ago
- Zero-shot Transfer Learning from English to Arabic☆29Updated 2 years ago
- GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning☆27Updated 3 years ago
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 3 months ago
- Fast whitespace correction with Transformers☆14Updated 6 months ago
- ☆17Updated last year
- OpusFilter - Parallel corpus processing toolkit☆102Updated 3 months ago
- NewsQuizQA is a quiz-style question-answer dataset used for generating quiz questions about the news☆34Updated 3 years ago
- The code for EMNLP2022 paper "Improved grammatical error correction by ranking elementary edits"☆19Updated last year
- Source code for paper Grammatical Error Correction in Low-Resource Scenarios (W-NUT 2019)☆13Updated 2 years ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆21Updated last year
- Showcasing various NLP Downstream tasks Training with pre-trained Language models using Pytorch Lightning☆12Updated 2 years ago
- Multilingual abstractive summarization dataset extracted from WikiHow.☆85Updated 3 years ago
- A simple neural truecaser written in pytorch and allennlp.☆32Updated 5 months ago
- Semeval-2021 Multilingual and Cross-lingual Word-in-Context Task☆18Updated 3 years ago
- ☆54Updated last year
- Statistics on multilingual datasets☆17Updated 2 years ago
- Code and data for the paper "Soft Gazetteers for Low-resource Named Entity Recognition"☆19Updated 4 years ago
- Benchmarking various Deep Learning models such as BERT, ALBERT, BiLSTMs on the task of sentence entailment using two datasets - MultiNLI …☆27Updated 3 years ago
- Source codes of Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction☆43Updated 3 years ago
- BERT models for many languages created from Wikipedia texts☆34Updated 4 years ago
- XED multilingual emotion datasets☆56Updated last year
- FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction☆23Updated 2 years ago
- Uses GloVe embeddings and greedy sequence segmentation to semantically segment a text document into any number of k segments.☆32Updated 5 years ago
- evaluation suite for testing automatic grammatical error corrections☆38Updated 7 years ago
- Summary of Responses to Questionnaire on Annotation Platform https://forms.gle/iZk8kehkjAWmB8xe9☆58Updated 4 years ago
- This repository contains materials for our tutorial on automatic grammatical error correction: R. Grundkiewicz, C. Bryant, M. Felice: A C…☆38Updated 3 years ago
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.☆43Updated 6 months ago