A collection of notebooks for Natural Language Processing
☆25Jan 13, 2025Updated last year
Alternatives and similar repositories for NLP-Notebooks-Newspaper-Collections
Users that are interested in NLP-Notebooks-Newspaper-Collections are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Tutorial on NE processing for Digital Humanities - DH Utrech 2019☆24Jul 18, 2019Updated 6 years ago
- ☆14Jul 11, 2022Updated 3 years ago
- Pipeline for the production of digital scholarly editions of archival collections☆14Feb 22, 2024Updated 2 years ago
- Convert Transkribus PAGE-XML to standard PAGE-XML☆12Dec 10, 2025Updated 4 months ago
- Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"☆13Dec 14, 2021Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and appl…☆19Jun 5, 2025Updated 10 months ago
- Small python package to measure OCR quality and other related metrics.☆27Feb 19, 2024Updated 2 years ago
- The GitHub repository containing all the material related to the Computational Thinking and Programming course of the Digital Humanities …☆30May 23, 2020Updated 5 years ago
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆28Oct 3, 2021Updated 4 years ago
- An extensible viewer for OCR-D mets.xml files☆23May 30, 2024Updated last year
- IIIF Examples and useful code☆20Sep 10, 2025Updated 7 months ago
- Text Corpus of African American Fiction and Poetry, from 1853-1923☆11Aug 5, 2020Updated 5 years ago
- Noise-robust de-duplication at scale☆19Apr 9, 2023Updated 3 years ago
- A python module for evaluating NERC and NEL system performances as defined in the HIPE shared tasks (formerly CLEF-HIPE-2020-scorer).☆15Jun 4, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- This repo contains files downloaded from Transkribus with corresponding suggested OCR improvements (performed using ChatGPT AI).☆19Mar 3, 2026Updated last month
- Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)☆15Jan 20, 2026Updated 2 months ago
- OCRopus model for Gothic print (Fraktur)☆19Feb 16, 2020Updated 6 years ago
- Constituency-level results and demographics used in the FT's analyses of the 2017 UK general election☆10Jun 16, 2017Updated 8 years ago
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆13Nov 21, 2023Updated 2 years ago
- Named Entity Recognition☆19Feb 13, 2026Updated 2 months ago
- DHLAB is a library of python modules for accessing text and pictures at the National Library of Norway.☆25Oct 13, 2025Updated 6 months ago
- ☆17Mar 31, 2025Updated last year
- version 4.x of the Princeton Geniza Project☆12Updated this week
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- List of New York Times wedding announcements used in an Upshot story on name-changing.☆10Mar 7, 2019Updated 7 years ago
- Python for Humanities☆13Apr 7, 2026Updated last week
- ☆68Mar 23, 2026Updated 3 weeks ago
- Digital Research Methods with Mathematica, 2nd rev. ed., 2020☆15Sep 8, 2020Updated 5 years ago
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated last year
- Modules used for separating articles in (historical) newspapers and similar documents. This repository is part of the European Union's Ho…☆22Sep 2, 2022Updated 3 years ago
- Automatic text comparison with an extendable variance classifier☆13Sep 11, 2023Updated 2 years ago
- Up Business is a clean and modern landing page, inspired on light illustrations with a modern look, that can be used for companies or to …☆29Jul 9, 2025Updated 9 months ago
- A digital edition of the 24 Probstücke of the Oberclasse by Johann Mattheson.☆11Mar 25, 2026Updated 2 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resource…☆27Feb 16, 2026Updated last month
- Rhythm analysis toolkit in Python☆13Sep 29, 2023Updated 2 years ago
- A simple vector space model based tool for sentiment analysis of literary texts☆18Sep 17, 2024Updated last year
- Minimal code to train ELMo models in recent versions of TensorFlow☆14Apr 30, 2023Updated 2 years ago
- Data for the HIPE 2022 shared task.☆21Nov 29, 2023Updated 2 years ago
- Staged Training for Transformer Language Models☆33Mar 31, 2022Updated 4 years ago
- ALTO XML schema - latest and all former versions☆55Jan 20, 2026Updated 2 months ago