Shef-AIRE / llms_post-ocr_correction
Leveraging LLMs for Post-OCR Correction of Historical Newspapers
☆11Updated 10 months ago
Alternatives and similar repositories for llms_post-ocr_correction:
Users that are interested in llms_post-ocr_correction are comparing it to the libraries listed below
- You Actually Look Twice At it☆33Updated 3 months ago
- ☆60Updated this week
- Page-wise text recognition with lower-supervision line data models☆31Updated 2 weeks ago
- Ground Truth Resources for the HTR of patrimonial documents☆42Updated this week
- The repository provides access to the source code for Transcription Pearl, an Handwritten Text Recognition (HTR) tool, that uses AI to tr…☆31Updated 5 months ago
- Layout Analysis Dataset with Segmonto (LADaS)☆20Updated 2 months ago
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆36Updated last year
- A model(ing framework) for sample efficient OCR☆57Updated 2 years ago
- Finding mentions and citations to named and implicit research datasets from within the academic literature☆24Updated 6 months ago
- A fully-fledge PyTorch package for Morphological Analysis, tailored to morphologically rich and historical languages.☆23Updated last year
- Extension for pie to include taggers with their models and pre/postprocessors☆10Updated 10 months ago
- This repo work as a sandbox enviroment for htrflow.☆32Updated last month
- Modules used for separating articles in (historical) newspapers and similar documents. This repository is part of the European Union's Ho…☆20Updated 2 years ago
- The grobidmonkey package is an open-source package designed for postprocessing GROBID outputs.☆11Updated last year
- 🗺️ Data Cleaning and Textual Data Visualization 🗺️☆168Updated 10 months ago
- Layout analysis to find layout elements in documents (similar to P2PaLA)☆19Updated 2 weeks ago
- Tools for normalizing the use of some characters and checking file consistencies☆11Updated 3 months ago
- An OCR evaluation tool☆65Updated this week
- Small collection of PAGE XML related scripts used at the ZPD Würzburg☆13Updated 8 months ago
- Repository hosting the common code for the entity-fishing clients☆10Updated 11 months ago
- dhSegment on pytorch☆34Updated last year
- Automatic transcription models for Chinese historical documents trained with the kraken OCR engine☆13Updated last year
- FrameBERT: Conceptual Metaphor Detection with Frame Embedding Learning. Presented at EACL 2023.☆28Updated last year
- Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset☆27Updated 2 years ago
- A software to detect text reuse with BLAST.☆14Updated 5 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆175Updated 2 years ago
- Reichsanzeiger-NLP: NER/NEL corpus for the German historical newspaper "Deutscher Reichsanzeiger und Preußischer Staatsanzeiger" (1819–19…☆16Updated 6 months ago
- TopicGPT allows to integrate the benefits of LLMs into Topic Modelling☆89Updated 10 months ago
- High-performance text aligner for large collections of texts☆51Updated last week
- Annotation tool (NER) for XML documents (TEI, EAD) - WIP☆10Updated 2 years ago