Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"
☆13Nov 26, 2024Updated last year
Alternatives and similar repositories for tokenizers_intrinsic_benchmark
Users that are interested in tokenizers_intrinsic_benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- PathPiece tokenizer☆14Nov 10, 2024Updated last year
- Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"☆18May 15, 2025Updated 11 months ago
- [ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated last year
- Code for "Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding" (EMNLP 2020).☆11May 1, 2025Updated last year
- Complete set of English dialect transformation rules and evaluation code☆17Jun 7, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- 🌍 A simple script for taking automated screenshots from a Leaflet map☆15Mar 29, 2018Updated 8 years ago
- Can Large Language Models Identify Authorship? (EMNLP 2024 Findings)☆13Feb 4, 2025Updated last year
- This repository includes pneumonia detection on Chest X-ray Images by using Deep Learning(Keras).☆21Nov 6, 2022Updated 3 years ago
- ☆10Nov 8, 2023Updated 2 years ago
- Code and models for the CVPR 2017 paper "DeepNav: Learning to Navigate Large Cities"☆13Feb 16, 2020Updated 6 years ago
- Split bib files for anthology bibliography for overleaf☆11Aug 25, 2024Updated last year
- PANiC - PAraphrasing Noun-Compounds☆15Apr 6, 2018Updated 8 years ago
- Demo server for TREC LiveQA competition☆11Dec 7, 2016Updated 9 years ago
- [EMNLP2025] Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling☆15Nov 20, 2025Updated 5 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Detail-Sensitive Panoramic Annular Semantic Segmentation☆12May 19, 2022Updated 3 years ago
- ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy☆15Jul 19, 2021Updated 4 years ago
- Code for the ILNewsDiff Twitter account☆10May 23, 2023Updated 2 years ago
- Simple-to-use scoring function for arbitrarily tokenized texts.☆48Feb 19, 2025Updated last year
- Event based Sign-Language-Translation☆19Feb 27, 2026Updated 2 months ago
- TensorFlow implementation of "Generating Sentences from a Continuous Space"☆11Sep 16, 2019Updated 6 years ago
- ☆26Sep 26, 2025Updated 7 months ago
- Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.☆13Jan 5, 2023Updated 3 years ago
- Dialogue Act classification☆18Jan 15, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- 🔄 ASCII / IPA conversion for Typst☆22Jan 8, 2026Updated 3 months ago
- Collection of academic works in natural language processing, computational linguistics, and computational cognitive science that study th…☆22Mar 20, 2024Updated 2 years ago
- Code for "Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations" [NAACL Findings 2024]☆15Apr 3, 2026Updated last month
- Statistics on multilingual datasets☆17Jul 12, 2022Updated 3 years ago
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆32Jun 5, 2025Updated 11 months ago
- Materials for LOT School 2023, "Language Learning: A Data-Driven Approach"☆14Aug 14, 2024Updated last year
- EEG-MI signal classification DL model.☆14Apr 26, 2024Updated 2 years ago
- ☆18Feb 4, 2025Updated last year
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆27Nov 25, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- (BMVC2021, Oral) The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.☆18Apr 22, 2022Updated 4 years ago
- Three little Python scripts for data preparation: remove commas, add commas, concatenate files☆16Jul 26, 2017Updated 8 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆28Nov 30, 2024Updated last year
- This is a Utrecht University dissertation template for LaTeX☆22Jul 31, 2025Updated 9 months ago
- An updated version of the Parser-v1 repo, used for Stanford's submission in the CoNLL17 shared task.☆45Aug 15, 2018Updated 7 years ago
- Find informative examples to efficiently (human)-evaluate NLG models.☆18Apr 22, 2026Updated 2 weeks ago
- ☆13Apr 16, 2021Updated 5 years ago