Compiled tools, datasets, and other resources for historical text normalization.
☆20Jun 18, 2019Updated 6 years ago
Alternatives and similar repositories for histnorm
Users that are interested in histnorm are comparing it to the libraries listed below
Sorting:
- A tool for automatic spelling normalization☆21Jan 18, 2021Updated 5 years ago
- Digitale Geisteswissenschaften rund um Graphentechnologien☆10Feb 12, 2026Updated 3 weeks ago
- The website of the Oscar Project☆11Mar 27, 2025Updated 11 months ago
- ☆15Aug 14, 2018Updated 7 years ago
- Temporary remove unused tokens during training to save ram and speed.☆23Jun 15, 2025Updated 8 months ago
- Data for the HIPE 2022 shared task.☆21Nov 29, 2023Updated 2 years ago
- ☆12Nov 3, 2024Updated last year
- UDapter is a multilingual dependency parser that uses "contextual" adapters together with language-typology features for language-specifi…☆31Dec 5, 2022Updated 3 years ago
- ☆32Sep 27, 2021Updated 4 years ago
- Libraries, Archives and Museums (LAM)☆88Oct 4, 2022Updated 3 years ago
- A parallel evaluation data set of SAP software documentation with document structure annotation☆14Jul 30, 2025Updated 7 months ago
- ☆10Feb 2, 2021Updated 5 years ago
- [ACL‘20] Highway Transformer: A Gated Transformer.☆33Dec 5, 2021Updated 4 years ago
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- Python source code for EMNLP 2020 paper "Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT".☆35Mar 16, 2022Updated 3 years ago
- Linguistic Reconstruction with LingPy☆15Aug 5, 2024Updated last year
- Modified version of fairseq, including new implementations for criterions using reinforcement learning methods.☆11Aug 14, 2019Updated 6 years ago
- Modified Editorial website template. Based on Editorial theme in html5up.net, adapted for Jekyll by Andrew Bancich.☆11Jun 10, 2024Updated last year
- Linear Attention for Efficient Bidirectional Sequence Modeling☆15May 13, 2025Updated 9 months ago
- Public repository for Coptic SCRIPTORIUM Corpora Releases☆40Dec 12, 2025Updated 2 months ago
- MATLAB code for Stein Point Markov Chain Monte Carlo.☆13Jul 3, 2019Updated 6 years ago
- Creating crowdsourcing based experiments made easy☆10May 25, 2020Updated 5 years ago
- Identifying Nuances in Fake News vs. Satire: Using Semantic and Linguistic Cues (NLP4IF, EMNLP-IJCNLP 2019)☆11Dec 21, 2020Updated 5 years ago
- ☆51Aug 18, 2024Updated last year
- ☆10Dec 17, 2020Updated 5 years ago
- decontamination☆26Dec 3, 2025Updated 3 months ago
- Poetry Corpora Annotated on Aesthetic Emotions☆12Aug 2, 2022Updated 3 years ago
- KnowMAN: Weakly Supervised Multinomial Adversarial Networks☆12Nov 9, 2021Updated 4 years ago
- ☆13Nov 28, 2025Updated 3 months ago
- Extension for pie to include taggers with their models and pre/postprocessors☆11May 30, 2024Updated last year
- Latin texts annotated for named entities and NER tagger used for the Herodotos Project (Ohio State University / Ghent University)☆11Sep 26, 2022Updated 3 years ago
- Label shift estimation for transfer difficulty with Familiarity.☆10Feb 4, 2025Updated last year
- ☆13Dec 28, 2022Updated 3 years ago
- Collection of description of concepts, procedures, and simple XSLT files for text processing, e.g. simplify InDesign documents (.idml) to…☆12Jan 9, 2020Updated 6 years ago
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Jul 11, 2023Updated 2 years ago
- Code for our paper Re-balancing Variational Autoencoder Loss for Molecule Sequence Generation.☆11Sep 4, 2022Updated 3 years ago
- This repository provides the source code used to automatically generate the book summarization datasets described in the paper titled "Ec…☆10Apr 14, 2025Updated 10 months ago
- ☆45Sep 26, 2021Updated 4 years ago
- Python tools for performing various operations on ALTO XML files☆49Feb 27, 2025Updated last year