mikahama / natasView external linksLinks
Python 3 library for processing historical English
☆68Aug 10, 2024Updated last year
Alternatives and similar repositories for natas
Users that are interested in natas are comparing it to the libraries listed below
Sorting:
- ☆11Nov 14, 2021Updated 4 years ago
- DFKI Layout Detection for OCR-D☆47May 1, 2025Updated 9 months ago
- Awesome AI in Libraries☆17Jul 21, 2023Updated 2 years ago
- nnanno is a collection of tools that sample, annotate and apply computer vision to the Newspaper Navigator dataset☆17Oct 16, 2024Updated last year
- Post-processing OCR errors with seq2seq models☆28Jul 30, 2020Updated 5 years ago
- Web application for transcribing OCR ground truth from Archive.org☆17Feb 22, 2018Updated 7 years ago
- Detect and align similar passages☆117Sep 25, 2025Updated 4 months ago
- An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and so on. Also supporting some non-Uralic languages such as Span…☆91Nov 3, 2025Updated 3 months ago
- OCRopus model for Gothic print (Fraktur)☆19Feb 16, 2020Updated 5 years ago
- Public API cache proxy built on the Earth Science Online Video Database, an Airtable base, which also syncs to Zotero and broadcasts new …☆13Updated this week
- Newspaper Segmentation into images and text☆12Jan 11, 2019Updated 7 years ago
- Given a text, wrap it into phrases and send them to Yandex's search engine. If it yields a "did you mean:", substitute the original phras…☆11Dec 13, 2018Updated 7 years ago
- ☆13Dec 28, 2022Updated 3 years ago
- convert PubLayNet data into METS/PAGE-XML☆10Mar 17, 2020Updated 5 years ago
- Small collection of PAGE XML related scripts used at the ZPD Würzburg☆12Aug 2, 2024Updated last year
- Tools for TICCL☆14Dec 12, 2025Updated 2 months ago
- ☆27Feb 2, 2021Updated 5 years ago
- ☆10Mar 16, 2023Updated 2 years ago
- Umbrella repository that describes the collections contained in any given release of ELTeC☆13Jan 26, 2022Updated 4 years ago
- ☆263Jul 7, 2025Updated 7 months ago
- Scrape and structure raw data from the Norwegian parliament's API.☆12Oct 24, 2025Updated 3 months ago
- Reichsanzeiger-NLP: NER/NEL corpus for the German historical newspaper "Deutscher Reichsanzeiger und Preußischer Staatsanzeiger" (1819–19…☆16Oct 18, 2024Updated last year
- ☆13Jun 25, 2019Updated 6 years ago
- ☆13Jan 12, 2026Updated last month
- NLP pipeline software using common workflow language☆35Apr 22, 2019Updated 6 years ago
- This repo work as a sandbox enviroment for htrflow.☆39Updated this week
- ☆141Mar 5, 2024Updated last year
- Fast, permanent and flexible patterns for sharing and computing on texts with metadata using Apache Arrow.☆15Mar 1, 2022Updated 3 years ago
- OCR-D post-correction module based on weighted finite-state transducers☆11Jan 13, 2024Updated 2 years ago
- Glyph Miner, a system for extracting glyphs from early typeset prints☆34Sep 29, 2016Updated 9 years ago
- ParlaMint: Comparable Parliamentary Corpora☆74Nov 2, 2025Updated 3 months ago
- Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)☆14Jan 20, 2026Updated 3 weeks ago
- Named Entity Recognition☆18Apr 9, 2025Updated 10 months ago
- Parallel Tar☆15Oct 31, 2019Updated 6 years ago
- ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction☆27Apr 25, 2019Updated 6 years ago
- Automatic alignment of books between HathiTrust, Internet Archive, Google Books, etc.☆36Updated this week
- Norwegian Speech Transformer Models☆19Oct 17, 2025Updated 3 months ago
- A data validation tool for MARC records☆25Dec 19, 2025Updated last month
- Master repository which includes most other OCR-D repositories as submodules☆72Jul 4, 2025Updated 7 months ago