ruathudo / post-ocr-correctionLinks
☆11Updated 3 years ago
Alternatives and similar repositories for post-ocr-correction
Users that are interested in post-ocr-correction are comparing it to the libraries listed below
Sorting:
- A simple neural truecaser written in pytorch and allennlp.☆33Updated last year
- GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning☆27Updated 4 years ago
- Minimal code to train ELMo models in recent versions of TensorFlow☆14Updated 2 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆38Updated 3 years ago
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For inst…☆22Updated 3 years ago
- Post-processing OCR errors with seq2seq models☆28Updated 4 years ago
- Zero-shot Transfer Learning from English to Arabic☆29Updated 3 years ago
- ☆17Updated 2 years ago
- LAReQA is a challenging benchmark for evaluating language agnostic answer retrieval from a multilingual candidate pool. This repository c…☆14Updated 5 years ago
- GC4LM: A Colossal (Biased) language model for German☆13Updated 4 years ago
- Correction of spaces with character-based neural language models.☆13Updated 2 years ago
- ☆17Updated 10 months ago
- An implementation of GrASP (Shnarch et. al., 2017)☆21Updated 2 years ago
- ☆139Updated last year
- Identifying Historical People, Places and other Entities: Shared Task on Named Entity Recognition and Linking on Historical Newspapers at…☆22Updated 10 months ago
- List of corpora annotated for coreference for different languages☆17Updated 10 months ago
- ☆17Updated 2 years ago
- Python 3 library for processing historical English☆67Updated 10 months ago
- A tiny BERT for low-resource monolingual models☆31Updated 8 months ago
- BERT models for many languages created from Wikipedia texts☆33Updated 5 years ago
- Code and models for our CLEF-HIPE (Named Entity Processing on Historical Newspapers) submissions☆19Updated 2 years ago
- Multilingual Open Text☆25Updated last month
- Code for extracting parallel corpora from pmindia☆16Updated 5 years ago
- Combining encoder-based language models☆11Updated 3 years ago
- zero-vocab or low-vocab embeddings☆18Updated 2 years ago
- Semeval-2021 Multilingual and Cross-lingual Word-in-Context Task☆18Updated 4 years ago
- Data and scripts for the proper evaluation of cross-lingual embeddings in multiple languages☆14Updated 5 years ago
- Statistics on multilingual datasets☆17Updated 2 years ago
- This repository contains the Arabic sarcasm dataset (ArSarcasm)☆24Updated 4 years ago
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago