Python 3 library for processing historical English
โ68Aug 10, 2024Updated last year
Alternatives and similar repositories for natas
Users that are interested in natas are comparing it to the libraries listed below
Sorting:
- โ11Nov 14, 2021Updated 4 years ago
- The amazing ๐will normalize non-standard Finnish/Swedish and dialectalize standard Finnish!โ30Aug 10, 2024Updated last year
- The NLG tool for Finnishโ24Dec 13, 2023Updated 2 years ago
- Convert Transkribus PAGE-XML to standard PAGE-XMLโ12Dec 10, 2025Updated 2 months ago
- DFKI Layout Detection for OCR-Dโ47May 1, 2025Updated 10 months ago
- Correction of spaces with character-based neural language models.โ13Aug 23, 2022Updated 3 years ago
- Awesome AI in Librariesโ17Jul 21, 2023Updated 2 years ago
- nnanno is a collection of tools that sample, annotate and apply computer vision to the Newspaper Navigator datasetโ17Oct 16, 2024Updated last year
- Post-processing OCR errors with seq2seq modelsโ28Jul 30, 2020Updated 5 years ago
- IIIF Examples and useful codeโ20Sep 10, 2025Updated 5 months ago
- Web application for transcribing OCR ground truth from Archive.orgโ17Feb 22, 2018Updated 8 years ago
- OCRopus model for Gothic print (Fraktur)โ19Feb 16, 2020Updated 6 years ago
- Public API cache proxy built on the Earth Science Online Video Database, an Airtable base, which also syncs to Zotero and broadcasts new โฆโ13Updated this week
- Given a text, wrap it into phrases and send them to Yandex's search engine. If it yields a "did you mean:", substitute the original phrasโฆโ11Dec 13, 2018Updated 7 years ago
- Newspaper Segmentation into images and textโ12Jan 11, 2019Updated 7 years ago
- Small collection of PAGE XML related scripts used at the ZPD Wรผrzburgโ12Aug 2, 2024Updated last year
- convert PubLayNet data into METS/PAGE-XMLโ10Mar 17, 2020Updated 5 years ago
- Scripts that clean up OCR and munge Hathi metadata.โ77Nov 4, 2017Updated 8 years ago
- โ27Feb 2, 2021Updated 5 years ago
- โ10Mar 16, 2023Updated 2 years ago
- Umbrella repository that describes the collections contained in any given release of ELTeCโ13Jan 26, 2022Updated 4 years ago
- โ263Jul 7, 2025Updated 8 months ago
- Bias correction for richness in abundance dataโ12Aug 18, 2025Updated 6 months ago
- NLP pipeline software using common workflow languageโ35Apr 22, 2019Updated 6 years ago
- Fast, permanent and flexible patterns for sharing and computing on texts with metadata using Apache Arrow.โ15Mar 1, 2022Updated 4 years ago
- Sentiment Corpus for Swedish ๐ธ๐ช Norwegian ๐ณ๐ด Danish ๐ฉ๐ฐ Finnish ๐ซ๐ฎ (and English ๐ด๓ ง๓ ข๓ ฅ๓ ฎ๓ ง๓ ฟ)โ15May 3, 2021Updated 4 years ago
- โ14Jul 11, 2022Updated 3 years ago
- Introduction to AI for GLAMโ20Feb 6, 2026Updated last month
- Parallel Tarโ15Oct 31, 2019Updated 6 years ago
- Named Entity Recognitionโ19Feb 13, 2026Updated 3 weeks ago
- Contains materials for a work in progress - "A Humanist's Cookbook for Natural Language Processing in Python."โ41Nov 29, 2021Updated 4 years ago
- Automatic alignment of books between HathiTrust, Internet Archive, Google Books, etc.โ36Feb 20, 2026Updated 2 weeks ago
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"โ39Dec 2, 2023Updated 2 years ago
- The GitHub repository for the AI for Humanists Projectโ20Jun 9, 2025Updated 9 months ago
- Master repository which includes most other OCR-D repositories as submodulesโ72Jul 4, 2025Updated 8 months ago
- A data validation tool for MARC recordsโ27Dec 19, 2025Updated 2 months ago
- Presentations, tutorials and data for the OCR workshop at LMUโ16Jun 2, 2017Updated 8 years ago
- Python API for KB data-servicesโ19Jan 30, 2020Updated 6 years ago
- Self hosting code for Recogito-Studioโ20Updated this week