Shef-AIRE / llms_post-ocr_correctionLinks
Leveraging LLMs for Post-OCR Correction of Historical Newspapers
☆15Updated last year
Alternatives and similar repositories for llms_post-ocr_correction
Users that are interested in llms_post-ocr_correction are comparing it to the libraries listed below
Sorting:
- You Actually Look Twice At it☆38Updated last year
- A fully-fledge PyTorch package for Morphological Analysis, tailored to morphologically rich and historical languages.☆24Updated 2 years ago
- Multilingual sentence alignment using sentence embeddings☆139Updated last year
- ☆11Updated 4 years ago
- A tool for automatic spelling normalization☆21Updated 5 years ago
- Small-vocabulary neural sequence-to-sequence generation with optional feature conditioning☆35Updated 3 weeks ago
- CERberus -- guardian against character errors☆29Updated last year
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆56Updated 2 years ago
- ☆66Updated this week
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆39Updated 2 years ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆74Updated 10 months ago
- Data for the HIPE 2022 shared task.☆21Updated 2 years ago
- ☆141Updated last year
- OCR post correction for old German corpus☆19Updated 3 years ago
- Latin BERT☆70Updated last year
- BERT and ELECTRA models trained on Europeana Newspapers☆38Updated 4 years ago
- SIGMORPHON 2022 Shared Task on Morpheme Segmentation☆31Updated 2 years ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆34Updated 7 months ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆54Updated 2 years ago
- ☆50Updated last year
- Python version for Doug Biber's Multidimensional Analysis (MDA)☆39Updated 2 months ago
- A python module for evaluating NERC and NEL system performances as defined in the HIPE shared tasks (formerly CLEF-HIPE-2020-scorer).☆15Updated last year
- Master repository which includes most other OCR-D repositories as submodules☆72Updated 7 months ago
- A PyPI package for fast word/character error rate (WER/CER) calculation☆71Updated 2 years ago
- Natural language processing resources for multiple languages, with an eye towards use for digital humanities.☆127Updated 4 years ago
- Layout Analysis Dataset with Segmonto (LADaS)☆23Updated 6 months ago
- Humanities Entity Recognition: robust, practical, efficient Named Entity Recognition for today's digital humanist☆37Updated 6 years ago
- Named entity annotation tool☆28Updated 2 years ago
- Norwegian Speech Transformer Models☆19Updated 3 months ago
- An OCR evaluation tool☆68Updated 5 months ago