Data for the HIPE 2022 shared task.
☆21Nov 29, 2023Updated 2 years ago
Alternatives and similar repositories for HIPE-2022-data
Users that are interested in HIPE-2022-data are comparing it to the libraries listed below
Sorting:
- A python module for evaluating NERC and NEL system performances as defined in the HIPE shared tasks (formerly CLEF-HIPE-2020-scorer).☆15Jun 4, 2024Updated last year
- Identifying Historical People, Places and other Entities: Shared Task on Named Entity Recognition and Linking on Historical Newspapers at…☆21Aug 1, 2024Updated last year
- Code and models for our CLEF-HIPE (Named Entity Processing on Historical Newspapers) submissions☆20Mar 27, 2023Updated 2 years ago
- CERberus -- guardian against character errors☆29Feb 15, 2024Updated 2 years ago
- Latin texts annotated for named entities and NER tagger used for the Herodotos Project (Ohio State University / Ghent University)☆11Sep 26, 2022Updated 3 years ago
- Libraries, Archives and Museums (LAM)☆88Oct 4, 2022Updated 3 years ago
- The training codes of Jasper-Token-Compression-600M☆19Nov 19, 2025Updated 3 months ago
- PathPiece tokenizer☆13Nov 10, 2024Updated last year
- Named Entity Recognition☆19Feb 13, 2026Updated 2 weeks ago
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- Repositorio general para Bootcamps de Data Science en Coding Dojo☆11Nov 13, 2025Updated 3 months ago
- Contextualized per-token embeddings☆34May 11, 2025Updated 9 months ago
- Compiled tools, datasets, and other resources for historical text normalization.☆20Jun 18, 2019Updated 6 years ago
- Repository for "Towards Robust Named Entity Recognition for Historic German"☆18Dec 11, 2020Updated 5 years ago
- Codebase for running (conditional) probing experiments☆22Nov 13, 2022Updated 3 years ago
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆26Nov 25, 2024Updated last year
- OCR post correction for old German corpus☆19Aug 29, 2022Updated 3 years ago
- Modules used for separating articles in (historical) newspapers and similar documents. This repository is part of the European Union's Ho…☆22Sep 2, 2022Updated 3 years ago
- ☆44Feb 11, 2026Updated 2 weeks ago
- Code for SaGe subword tokenizer (EACL 2023)☆27Nov 30, 2024Updated last year
- A collection of notebooks for Natural Language Processing☆25Jan 13, 2025Updated last year
- [NeurIPS 2025] MergeBench: A Benchmark for Merging Domain-Specialized LLMs☆43Feb 11, 2026Updated 2 weeks ago
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆28Oct 3, 2021Updated 4 years ago
- 🚀🤗 A collection of templates for Hugging Face Spaces☆35Oct 9, 2023Updated 2 years ago
- A Pythonic API and some command line tools to access the Transkribus server via its REST API☆28Nov 25, 2022Updated 3 years ago
- EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling☆34Nov 21, 2021Updated 4 years ago
- Mathematical foundations of data analysis, Winter semester 22-23☆13Jan 31, 2023Updated 3 years ago
- User-friendly viewer for Parquet files☆10Jan 10, 2026Updated last month
- ☆10Sep 13, 2025Updated 5 months ago
- A library for probing Stockfish's NNUEs. The code for reading parameters and forward propagation is taken from Stockfish☆12Nov 18, 2025Updated 3 months ago
- ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost☆42Nov 15, 2023Updated 2 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆38Dec 14, 2021Updated 4 years ago
- Modified Editorial website template. Based on Editorial theme in html5up.net, adapted for Jekyll by Andrew Bancich.☆11Jun 10, 2024Updated last year
- Convert Transkribus PAGE-XML to standard PAGE-XML☆12Dec 10, 2025Updated 2 months ago
- ☆53Feb 10, 2025Updated last year
- Linear Attention for Efficient Bidirectional Sequence Modeling☆15May 13, 2025Updated 9 months ago
- The production website for SquiggleConf: a conference for excellent web dev tooling☆11Jan 27, 2026Updated last month
- Official implementation of the winning system at SemEval-2021 Task 11 - NLP Contribution Graph (Best System Paper Award 🏆)☆11Aug 24, 2025Updated 6 months ago
- 0-Shot Tokenizer Transplant☆14May 16, 2025Updated 9 months ago