Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"
☆13Nov 26, 2024Updated last year
Alternatives and similar repositories for tokenizers_intrinsic_benchmark
Users that are interested in tokenizers_intrinsic_benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- PathPiece tokenizer☆14Nov 10, 2024Updated last year
- Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"☆18May 15, 2025Updated last year
- [ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated last year
- Code for "Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding" (EMNLP 2020).☆11May 1, 2025Updated last year
- Complete set of English dialect transformation rules and evaluation code☆17Jun 7, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- 🌍 A simple script for taking automated screenshots from a Leaflet map☆15Mar 29, 2018Updated 8 years ago
- Can Large Language Models Identify Authorship? (EMNLP 2024 Findings)☆13Feb 4, 2025Updated last year
- This repository includes pneumonia detection on Chest X-ray Images by using Deep Learning(Keras).☆22Nov 6, 2022Updated 3 years ago
- ☆10Nov 8, 2023Updated 2 years ago
- Code and models for the CVPR 2017 paper "DeepNav: Learning to Navigate Large Cities"☆13Feb 16, 2020Updated 6 years ago
- Split bib files for anthology bibliography for overleaf☆11Aug 25, 2024Updated last year
- PANiC - PAraphrasing Noun-Compounds☆15Apr 6, 2018Updated 8 years ago
- Demo server for TREC LiveQA competition☆11Dec 7, 2016Updated 9 years ago
- [EMNLP2025] Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling☆17Nov 20, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Detail-Sensitive Panoramic Annular Semantic Segmentation☆12May 19, 2022Updated 4 years ago
- ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy☆15Jul 19, 2021Updated 4 years ago
- Code for the ILNewsDiff Twitter account☆10May 23, 2023Updated 3 years ago
- Simple-to-use scoring function for arbitrarily tokenized texts.☆48Feb 19, 2025Updated last year
- Event based Sign-Language-Translation☆20May 9, 2026Updated 2 weeks ago
- TensorFlow implementation of "Generating Sentences from a Continuous Space"☆11Sep 16, 2019Updated 6 years ago
- ☆29May 6, 2026Updated 3 weeks ago
- Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.☆13Jan 5, 2023Updated 3 years ago
- Dialogue Act classification☆18Jan 15, 2024Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- 🔄 ASCII / IPA conversion for Typst☆22Jan 8, 2026Updated 4 months ago
- Collection of academic works in natural language processing, computational linguistics, and computational cognitive science that study th…☆22Mar 20, 2024Updated 2 years ago
- Code for "Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations" [NAACL Findings 2024]☆14Apr 3, 2026Updated last month
- Statistics on multilingual datasets☆17Jul 12, 2022Updated 3 years ago
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆32Jun 5, 2025Updated 11 months ago
- Materials for LOT School 2023, "Language Learning: A Data-Driven Approach"☆14Aug 14, 2024Updated last year
- EEG-MI signal classification DL model.☆14Apr 26, 2024Updated 2 years ago
- ☆18Feb 4, 2025Updated last year
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆27Nov 25, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- (BMVC2021, Oral) The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.☆18Apr 22, 2022Updated 4 years ago
- Three little Python scripts for data preparation: remove commas, add commas, concatenate files☆16Jul 26, 2017Updated 8 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆28Nov 30, 2024Updated last year
- This is a Utrecht University dissertation template for LaTeX☆22Jul 31, 2025Updated 9 months ago
- An updated version of the Parser-v1 repo, used for Stanford's submission in the CoNLL17 shared task.☆45Aug 15, 2018Updated 7 years ago
- Find informative examples to efficiently (human)-evaluate NLG models.☆17Apr 22, 2026Updated last month
- ☆13Apr 16, 2021Updated 5 years ago