Leveraging LLMs for Post-OCR Correction of Historical Newspapers
☆17May 12, 2026Updated last month
Alternatives and similar repositories for llms_post-ocr_correction
Users that are interested in llms_post-ocr_correction are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- annotation storage backend☆11Apr 3, 2025Updated last year
- Decodes Compact Disc data from microscope images of a CD's surface☆12Jan 14, 2023Updated 3 years ago
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆39Dec 2, 2023Updated 2 years ago
- Semantically Search Emojis From the Command Line!☆13Nov 26, 2023Updated 2 years ago
- LLM-only topic extraction and classification☆11Jun 3, 2026Updated last month
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Simple tool for generating tokens with open source transformers and/or calculate per-token surprisal.☆14Apr 15, 2026Updated 2 months ago
- veraPDF PDF parser☆35Jun 23, 2026Updated last week
- Generating text from RDF data with sequence to sequence models☆11Jul 25, 2018Updated 7 years ago
- Interactive Visualization Interface for Multidimensional Datasets☆68Nov 11, 2025Updated 7 months ago
- The imdb files with SBD-Trans OCR for TextVQA dataset.☆11Nov 30, 2021Updated 4 years ago
- ACL'2024-Main: Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Languag…☆12Sep 19, 2025Updated 9 months ago
- ☆14Jun 25, 2024Updated 2 years ago
- Showcase how mxbai-embed-large-v1 can be used to produce binary embedding. Binary embeddings enabled 32x storage savings and 40x faster r…☆19Mar 23, 2024Updated 2 years ago
- ☆22Jul 22, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆21Nov 19, 2024Updated last year
- Votrax SC01A digital parts simulation☆18Dec 23, 2016Updated 9 years ago
- ParCourE - Parallel Corpus Explorer☆12Dec 27, 2021Updated 4 years ago
- Distribution of word meanings in Wikipedia for English, Italian, French, German and Spanish.☆10Jan 4, 2021Updated 5 years ago
- Morphological analysis for Udmurt.☆12May 23, 2026Updated last month
- by NF6X. Forked from: https://gitlab.com/NF6X_Retrocomputing/8051dumper☆26Dec 7, 2021Updated 4 years ago
- Data and scripts for the proper evaluation of cross-lingual embeddings in multiple languages☆15Apr 11, 2020Updated 6 years ago
- материалы курса по питону для студентов дпо-программы "компьютерная лингвистика" в НИУ ВШЭ (2020-2021)☆12Feb 21, 2022Updated 4 years ago
- Umbrella repository that describes the collections contained in any given release of ELTeC☆13Jan 26, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Interlinear glossing with JS & CSS☆20Aug 23, 2015Updated 10 years ago
- Материалы к урса "Компьютерная лингвистика и информационные технологии" для 4-го курса бакалавриата направления "Фундаментальная и приклад…☆10Mar 25, 2021Updated 5 years ago
- ☆17Jul 30, 2024Updated last year
- Qualitative coding for computer scientists☆26Jun 25, 2026Updated last week
- Legacy version of CNN neural net toolkit (now called dynet)☆19Oct 8, 2016Updated 9 years ago
- Presentations, tutorials and data for the OCR workshop at LMU☆16Jun 2, 2017Updated 9 years ago
- Full Stack of Latvian Language Resources for Natural Language Understanding (NLU) and Generation (NLG)☆16Oct 20, 2022Updated 3 years ago
- Compute benchmark of table structure recognition.☆30Dec 2, 2025Updated 7 months ago
- Sequence Tagging with Cross-Lingual Transfer Learning☆16Jul 30, 2017Updated 8 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, app…☆21Nov 7, 2025Updated 7 months ago
- Comparing Audio Features for Unsupervised Sound Classification☆10Jun 22, 2022Updated 4 years ago
- 表格结构识别LGPMA推理☆25Nov 17, 2022Updated 3 years ago
- ☆10Aug 3, 2019Updated 6 years ago
- ☆21Apr 4, 2015Updated 11 years ago
- AI Journey 2019: Combined Solution☆15Dec 8, 2022Updated 3 years ago
- Transform audio files into mel spectrograms for text-to-speech model training☆12Aug 25, 2021Updated 4 years ago