Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"
☆13Nov 26, 2024Updated last year
Alternatives and similar repositories for tokenizers_intrinsic_benchmark
Users that are interested in tokenizers_intrinsic_benchmark are comparing it to the libraries listed below
Sorting:
- Simple-to-use scoring function for arbitrarily tokenized texts.☆47Feb 19, 2025Updated last year
- Complete set of English dialect transformation rules and evaluation code☆16Jun 7, 2024Updated last year
- Code and models for the CVPR 2017 paper "DeepNav: Learning to Navigate Large Cities"☆13Feb 16, 2020Updated 6 years ago
- Can Large Language Models Identify Authorship? (EMNLP 2024 Findings)☆12Feb 4, 2025Updated last year
- Code for the ILNewsDiff Twitter account☆10May 23, 2023Updated 2 years ago
- 藏语威利转写☆11Jul 19, 2016Updated 9 years ago
- ☆10Nov 8, 2023Updated 2 years ago
- Code for "Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding" (EMNLP 2020).☆11May 1, 2025Updated 10 months ago
- The Open Multilingual Wordnet Project Page☆14May 29, 2023Updated 2 years ago
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated 11 months ago
- TensorFlow implementation of "Generating Sentences from a Continuous Space"☆11Sep 16, 2019Updated 6 years ago
- 🌍 A simple script for taking automated screenshots from a Leaflet map☆15Mar 29, 2018Updated 7 years ago
- all of tibetan dictionary.ཚོང་ལས་ལས་དོན་དུ་སྤྱོད་མི་ཆོག གལ་སྲིད་འགལ་ན་ཁྲིམས་རྩོད་བྱུང་ངེས།☆15Oct 15, 2023Updated 2 years ago
- BERT-CasRel | Roberta-GPlinker | BERT-BILSTM-CRF☆11Apr 24, 2023Updated 2 years ago
- The respository describing a novel datasets for word association explanations☆13Sep 21, 2023Updated 2 years ago
- ALTER: Auxiliary Text Rewriting Tool for Natural Language Generation☆16Dec 10, 2022Updated 3 years ago
- Demo server for TREC LiveQA competition☆11Dec 7, 2016Updated 9 years ago
- codebase for the Text-based NP Enrichment (TNE) paper☆19Mar 12, 2024Updated last year
- Detail-Sensitive Panoramic Annular Semantic Segmentation☆12May 19, 2022Updated 3 years ago
- Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"☆18May 15, 2025Updated 9 months ago
- Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.☆13Jan 5, 2023Updated 3 years ago
- 利用Bert获取中文字、词向量☆10Jan 18, 2022Updated 4 years ago
- Split bib files for anthology bibliography for overleaf☆11Aug 25, 2024Updated last year
- The figures for the Deep Learning textbook (www.deeplearningbook.org)☆17Oct 9, 2017Updated 8 years ago
- 中文医疗NLP领域 数据集,论文 ,知识图谱,语料,工具包☆12Oct 15, 2020Updated 5 years ago
- Event based Sign-Language-Translation☆19Feb 27, 2026Updated last week
- EEG-MI signal classification DL model.☆14Apr 26, 2024Updated last year
- ☆13Apr 16, 2021Updated 4 years ago
- PathPiece tokenizer