Data for the HIPE 2022 shared task.
☆23May 15, 2026Updated last month
Alternatives and similar repositories for HIPE-2022-data
Users that are interested in HIPE-2022-data are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Identifying Historical People, Places and other Entities: Shared Task on Named Entity Recognition and Linking on Historical Newspapers at…☆21Aug 1, 2024Updated last year
- Latin texts annotated for named entities and NER tagger used for the Herodotos Project (Ohio State University / Ghent University)☆12Sep 26, 2022Updated 3 years ago
- CERberus -- guardian against character errors☆30Feb 15, 2024Updated 2 years ago
- Modules used for separating articles in (historical) newspapers and similar documents. This repository is part of the European Union's Ho…☆22Sep 2, 2022Updated 3 years ago
- Repository for "Towards Robust Named Entity Recognition for Historic German"☆18Dec 11, 2020Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- OCR post correction for old German corpus☆20Aug 29, 2022Updated 3 years ago
- Named Entity Recognition☆19Feb 13, 2026Updated 4 months ago
- PathPiece tokenizer☆14Nov 10, 2024Updated last year
- Metrical position in Greek hexameter.☆13Jun 25, 2026Updated last week
- ☆15Jul 11, 2022Updated 3 years ago
- Compiled tools, datasets, and other resources for historical text normalization.☆21Jun 18, 2019Updated 7 years ago
- ☆26Jul 11, 2022Updated 3 years ago
- Pedalion trees☆12Jan 24, 2023Updated 3 years ago
- Turn CTS TEI corpora into CEX collection files☆12Jun 16, 2021Updated 5 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A bunch of modules that use/extend CLTK in order to work with Greek and Latin corpora maintained by the Perseus DL☆12Oct 26, 2019Updated 6 years ago
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- The training codes of Jasper-Token-Compression-600M☆20Nov 19, 2025Updated 7 months ago
- BADLAD: Bengali Document Layout Analysis Dataset☆15May 12, 2024Updated 2 years ago
- HuCit KB: a knowledge base of classical texts and citable text units.☆11Nov 17, 2021Updated 4 years ago
- Teaching materials for the Applied Data Analysis course at DHOxSS. Data science methods to analyse humanities data.☆41Jan 6, 2026Updated 5 months ago
- ☆14Jul 12, 2022Updated 3 years ago
- Self hosting code for Recogito-Studio☆23Apr 13, 2026Updated 2 months ago
- ☆10May 8, 2026Updated last month
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Codebase for running (conditional) probing experiments☆21Nov 13, 2022Updated 3 years ago
- Contextualized per-token embeddings☆37Jun 23, 2026Updated last week
- ☆20Feb 17, 2024Updated 2 years ago
- Tutorial on NE processing for Digital Humanities - DH Utrech 2019☆24Jul 18, 2019Updated 6 years ago
- Unofficial implementation of QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition.☆63Oct 15, 2022Updated 3 years ago
- Patterns based on the W3C Web Annotation Model, primarily for use in linking resources describing historical phenomena with the places re…☆16Mar 6, 2020Updated 6 years ago
- Archive of the XML files of the Mannheim / Heidelberg CAMENA Neo-Latin project☆20Oct 10, 2018Updated 7 years ago
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆27Nov 25, 2024Updated last year
- Detect and align similar passages☆122Apr 27, 2026Updated 2 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for TACL 2020 paper "An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models"☆14Jul 31, 2020Updated 5 years ago
- f("A1") = 𓀀; also A1.png☆12Jun 4, 2026Updated 3 weeks ago
- The official repository for Toxic Commons and Celadon. Toxicity Classification for public domain data.☆22Jun 20, 2026Updated last week
- Code for SaGe subword tokenizer (EACL 2023)☆28Nov 30, 2024Updated last year
- In this repository we have all the codes that we have developed☆12Sep 13, 2023Updated 2 years ago
- Pre-processing text and tokenization for UTH-BERT☆10Sep 30, 2020Updated 5 years ago
- Convert Transkribus PAGE-XML to standard PAGE-XML☆12Dec 10, 2025Updated 6 months ago