Shef-AIRE / llms_post-ocr_correctionLinks
Leveraging LLMs for Post-OCR Correction of Historical Newspapers
☆15Updated last year
Alternatives and similar repositories for llms_post-ocr_correction
Users that are interested in llms_post-ocr_correction are comparing it to the libraries listed below
Sorting:
- You Actually Look Twice At it☆37Updated 11 months ago
- Natural language processing resources for multiple languages, with an eye towards use for digital humanities.☆127Updated 4 years ago
- ☆141Updated last year
- Multilingual sentence alignment using sentence embeddings☆135Updated last year
- A software to detect text reuse with BLAST.☆13Updated 6 years ago
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆38Updated 2 years ago
- Digital Humanities Across Borders☆50Updated last year
- ☆11Updated 4 years ago
- ☆51Updated last year
- ☆32Updated 3 years ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆74Updated 9 months ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆52Updated 2 years ago
- Code and models for our CLEF-HIPE (Named Entity Processing on Historical Newspapers) submissions☆19Updated 2 years ago
- The grobidmonkey package is an open-source package designed for postprocessing GROBID outputs.☆12Updated last year
- Python 3 library for processing historical English☆67Updated last year
- A collection of notebooks for Natural Language Processing☆25Updated 11 months ago
- ☆63Updated 3 weeks ago
- CERberus -- guardian against character errors☆29Updated last year
- A fully-fledge PyTorch package for Morphological Analysis, tailored to morphologically rich and historical languages.☆24Updated 2 years ago
- Extension for pie to include taggers with their models and pre/postprocessors☆11Updated last year
- Master repository which includes most other OCR-D repositories as submodules☆72Updated 6 months ago
- A module to compute textual lexical richness (aka lexical diversity).☆112Updated 2 years ago
- Python version for Doug Biber's Multidimensional Analysis (MDA)☆38Updated last month
- Latin BERT☆69Updated last year
- Modules used for separating articles in (historical) newspapers and similar documents. This repository is part of the European Union's Ho…☆22Updated 3 years ago
- Universal Romanizer that can convert any unicode script to roman (latin) script☆233Updated last year
- Small-vocabulary neural sequence-to-sequence generation with optional feature conditioning☆35Updated last week
- https://sites.google.com/site/multidimensionaltagger☆38Updated 2 years ago
- Tutorial on NE processing for Digital Humanities - DH Utrech 2019☆25Updated 6 years ago
- Detect and align similar passages☆115Updated 3 months ago