shrutirij / ocr-post-correction
☆134Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for ocr-post-correction
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆35Updated 11 months ago
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆66Updated last year
- OCR post correction for old German corpus☆19Updated 2 years ago
- ☆11Updated 2 years ago
- Python 3 library for processing historical English☆64Updated 3 months ago
- BERT and ELECTRA models trained on Europeana Newspapers☆36Updated 2 years ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 5 months ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆173Updated last year
- Data for the HIPE 2022 shared task.☆15Updated 11 months ago
- ☆74Updated 2 years ago
- multimodal document analysis☆159Updated 5 months ago
- ☆37Updated 3 years ago
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.☆85Updated last month
- Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.☆101Updated 2 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆152Updated 2 years ago
- A spaCy custom component that extracts and normalizes temporal expressions☆52Updated last year
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated 6 months ago
- ☆75Updated last year
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …☆106Updated 8 months ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆65Updated last year
- Python-based implementation of the Translate-Align-Retrieve method to automatically translate the SQuAD Dataset to Spanish.☆59Updated last year
- Code for the ICDAR2021 paper "Visual FUDGE: Form Understanding via Dynamic Graph Editing"☆33Updated 2 years ago
- OCR & Ground Truth Resources☆74Updated 2 years ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆91Updated last year
- xfspell — the Transformer Spell Checker☆187Updated 4 years ago
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Do…☆76Updated 4 months ago
- An implementation of GrASP (Shnarch et. al., 2017)☆21Updated 2 years ago
- This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with enti…☆242Updated last year
- CVPR 2022: Table Structure Recognition☆39Updated 2 years ago
- Main repository for "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"☆197Updated last year