A collection of notebooks for Natural Language Processing
☆25Jan 13, 2025Updated last year
Alternatives and similar repositories for NLP-Notebooks-Newspaper-Collections
Users that are interested in NLP-Notebooks-Newspaper-Collections are comparing it to the libraries listed below
Sorting:
- Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"☆13Dec 14, 2021Updated 4 years ago
- ☆14Jul 11, 2022Updated 3 years ago
- Convert Transkribus PAGE-XML to standard PAGE-XML☆12Dec 10, 2025Updated 2 months ago
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated 10 months ago
- List of New York Times wedding announcements used in an Upshot story on name-changing.☆10Mar 7, 2019Updated 6 years ago
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆28Oct 3, 2021Updated 4 years ago
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆13Nov 21, 2023Updated 2 years ago
- Minimal code to train ELMo models in recent versions of TensorFlow☆14Apr 30, 2023Updated 2 years ago
- This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and appl…☆19Jun 5, 2025Updated 9 months ago
- Named Entity Recognition☆19Feb 13, 2026Updated 2 weeks ago
- Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)☆15Jan 20, 2026Updated last month
- Noise-robust de-duplication at scale☆19Apr 9, 2023Updated 2 years ago
- A python module for evaluating NERC and NEL system performances as defined in the HIPE shared tasks (formerly CLEF-HIPE-2020-scorer).☆15Jun 4, 2024Updated last year
- IIIF Examples and useful code☆20Sep 10, 2025Updated 5 months ago
- OCRopus model for Gothic print (Fraktur)☆19Feb 16, 2020Updated 6 years ago
- Temporary remove unused tokens during training to save ram and speed.☆23Jun 15, 2025Updated 8 months ago
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆22Feb 14, 2024Updated 2 years ago
- Data for the HIPE 2022 shared task.☆21Nov 29, 2023Updated 2 years ago
- An extensible viewer for OCR-D mets.xml files☆22May 30, 2024Updated last year
- The GitHub repository containing all the material related to the Computational Thinking and Programming course of the Digital Humanities …☆30May 23, 2020Updated 5 years ago
- ☆66Feb 3, 2026Updated last month
- DHLAB is a library of python modules for accessing text and pictures at the National Library of Norway.☆25Oct 13, 2025Updated 4 months ago
- Small python package to measure OCR quality and other related metrics.☆27Feb 19, 2024Updated 2 years ago
- Staged Training for Transformer Language Models☆33Mar 31, 2022Updated 3 years ago
- Named entity annotation tool☆28Jul 6, 2023Updated 2 years ago
- 🚀🤗 A collection of templates for Hugging Face Spaces☆35Oct 9, 2023Updated 2 years ago
- ☆28Feb 24, 2025Updated last year
- Digital Humanities course site☆21Nov 22, 2021Updated 4 years ago
- An example of how to use spaCy for extremely large files without running into memory issues☆36Sep 17, 2022Updated 3 years ago
- Cours visualisation 2020☆13Oct 3, 2023Updated 2 years ago
- Glyph Miner, a system for extracting glyphs from early typeset prints☆34Sep 29, 2016Updated 9 years ago
- [NeurIPS 2025] Let LRMs Break Free from Overthinking via Self-Braking Tuning. https://arxiv.org/abs/2505.14604☆55Nov 4, 2025Updated 4 months ago
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- Script that converts JSONL output from Doccano to the BIO format☆10Jul 5, 2019Updated 6 years ago
- QGIS Plugin to explore Google Earth Engine Data Catalog☆11Sep 25, 2024Updated last year
- Linear Attention for Efficient Bidirectional Sequence Modeling☆15May 13, 2025Updated 9 months ago
- Collection of iPython notebooks with some quick demos☆11May 25, 2017Updated 8 years ago
- "Actionable Ethics for Data Scientists" Workshop Material @ ODSC☆10May 31, 2024Updated last year
- Simple-to-use scoring function for arbitrarily tokenized texts.☆47Feb 19, 2025Updated last year