shrutirij / ocr-post-correction
☆136Updated 10 months ago
Alternatives and similar repositories for ocr-post-correction:
Users that are interested in ocr-post-correction are comparing it to the libraries listed below
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆36Updated last year
- ☆76Updated 2 years ago
- OCR & Ground Truth Resources☆74Updated 2 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆37Updated 3 years ago
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆66Updated last year
- ☆11Updated 3 years ago
- Code for the ICDAR2021 paper "Visual FUDGE: Form Understanding via Dynamic Graph Editing"☆33Updated 2 years ago
- Python 3 library for processing historical English☆64Updated 5 months ago
- OCR post correction for old German corpus☆19Updated 2 years ago
- 🧪 Cutting-edge experimental spaCy components and features☆96Updated 8 months ago
- multimodal document analysis☆161Updated 7 months ago
- ☆55Updated 3 years ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆65Updated 2 years ago
- ☆37Updated 3 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆174Updated last year
- Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.☆102Updated 2 years ago
- Publicly released code for the LAMBERT model☆101Updated 3 years ago
- ☆64Updated last year
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …☆106Updated 10 months ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 7 months ago
- ☆79Updated last year
- dhSegment on pytorch☆33Updated last year
- A Python library aimed at dissecting and augmenting NER training data.☆57Updated last year
- A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.☆104Updated 9 months ago
- Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.☆96Updated last year
- Toolbox for OCR post-correction☆122Updated 5 years ago
- Form images from U.S. National Archives annotated with text bounding boxes, classes, relationships, and transcription.☆36Updated 2 years ago
- ☆44Updated 5 months ago
- A spaCy custom component that extracts and normalizes temporal expressions☆52Updated last year
- ☆106Updated last year