Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"
☆13Nov 26, 2024Updated last year
Alternatives and similar repositories for tokenizers_intrinsic_benchmark
Users that are interested in tokenizers_intrinsic_benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- PathPiece tokenizer☆14Nov 10, 2024Updated last year
- Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"☆18May 15, 2025Updated 10 months ago
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated 11 months ago
- Code for "Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding" (EMNLP 2020).☆11May 1, 2025Updated 10 months ago
- Complete set of English dialect transformation rules and evaluation code☆16Jun 7, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- 🌍 A simple script for taking automated screenshots from a Leaflet map☆15Mar 29, 2018Updated 7 years ago
- Can Large Language Models Identify Authorship? (EMNLP 2024 Findings)☆12Feb 4, 2025Updated last year
- This repository includes pneumonia detection on Chest X-ray Images by using Deep Learning(Keras).☆21Nov 6, 2022Updated 3 years ago
- ☆10Nov 8, 2023Updated 2 years ago
- Code and models for the CVPR 2017 paper "DeepNav: Learning to Navigate Large Cities"☆13Feb 16, 2020Updated 6 years ago
- Split bib files for anthology bibliography for overleaf☆11Aug 25, 2024Updated last year
- PANiC - PAraphrasing Noun-Compounds☆15Apr 6, 2018Updated 7 years ago
- Demo server for TREC LiveQA competition☆11Dec 7, 2016Updated 9 years ago
- Detail-Sensitive Panoramic Annular Semantic Segmentation☆12May 19, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy☆15Jul 19, 2021Updated 4 years ago
- Code for the ILNewsDiff Twitter account☆10May 23, 2023Updated 2 years ago
- Simple-to-use scoring function for arbitrarily tokenized texts.☆48Feb 19, 2025Updated last year
- Event based Sign-Language-Translation☆19Feb 27, 2026Updated last month
- TensorFlow implementation of "Generating Sentences from a Continuous Space"☆11Sep 16, 2019Updated 6 years ago
- ☆24Sep 26, 2025Updated 6 months ago
- Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.☆13Jan 5, 2023Updated 3 years ago
- Dialogue Act classification☆18Jan 15, 2024Updated 2 years ago
- 🔄 ASCII / IPA conversion for Typst☆22Jan 8, 2026Updated 2 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Collection of academic works in natural language processing, computational linguistics, and computational cognitive science that study th…☆22Mar 20, 2024Updated 2 years ago
- (NAACL 2024) Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations☆15Apr 14, 2025Updated 11 months ago
- Statistics on multilingual datasets☆17Jul 12, 2022Updated 3 years ago
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆31Jun 5, 2025Updated 9 months ago
- Materials for LOT School 2023, "Language Learning: A Data-Driven Approach"☆14Aug 14, 2024Updated last year
- EEG-MI signal classification DL model.☆14Apr 26, 2024Updated last year
- ☆18Feb 4, 2025Updated last year
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆27Nov 25, 2024Updated last year
- (BMVC2021, Oral) The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.☆18Apr 22, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Three little Python scripts for data preparation: remove commas, add commas, concatenate files☆16Jul 26, 2017Updated 8 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆28Nov 30, 2024Updated last year
- This is a Utrecht University dissertation template for LaTeX☆22Jul 31, 2025Updated 7 months ago
- An updated version of the Parser-v1 repo, used for Stanford's submission in the CoNLL17 shared task.☆45Aug 15, 2018Updated 7 years ago
- Find informative examples to efficiently (human)-evaluate NLG models.☆18Feb 27, 2026Updated last month
- ☆13Apr 16, 2021Updated 4 years ago
- [ACL 2025 Main] Official Repo for Paper "Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric"☆36Feb 10, 2026Updated last month