MeLeLBGU / tokenizers_intrinsic_benchmarkView external linksLinks
Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"
☆13Nov 26, 2024Updated last year
Alternatives and similar repositories for tokenizers_intrinsic_benchmark
Users that are interested in tokenizers_intrinsic_benchmark are comparing it to the libraries listed below
Sorting:
- Simple-to-use scoring function for arbitrarily tokenized texts.☆47Feb 19, 2025Updated 11 months ago
- Code for the ILNewsDiff Twitter account☆10May 23, 2023Updated 2 years ago
- ☆10Nov 8, 2023Updated 2 years ago
- Complete set of English dialect transformation rules and evaluation code☆16Jun 7, 2024Updated last year
- Can Large Language Models Identify Authorship? (EMNLP 2024 Findings)☆12Feb 4, 2025Updated last year
- 藏语威利转写☆11Jul 19, 2016Updated 9 years ago
- Code for "Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding" (EMNLP 2020).☆11May 1, 2025Updated 9 months ago
- Code and models for the CVPR 2017 paper "DeepNav: Learning to Navigate Large Cities"☆13Feb 16, 2020Updated 5 years ago
- all of tibetan dictionary.ཚོང་ལས་ལས་དོན་དུ་སྤྱོད་མི་ཆོག གལ་སྲིད་འགལ་ན་ཁྲིམས་རྩོད་བྱུང་ངེས།☆15Oct 15, 2023Updated 2 years ago
- The respository describing a novel datasets for word association explanations☆13Sep 21, 2023Updated 2 years ago
- The Open Multilingual Wordnet Project Page☆14May 29, 2023Updated 2 years ago
- 🌍 A simple script for taking automated screenshots from a Leaflet map☆15Mar 29, 2018Updated 7 years ago
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated 10 months ago
- ALTER: Auxiliary Text Rewriting Tool for Natural Language Generation☆16Dec 10, 2022Updated 3 years ago
- BERT-CasRel | Roberta-GPlinker | BERT-BILSTM-CRF☆11Apr 24, 2023Updated 2 years ago
- TensorFlow implementation of "Generating Sentences from a Continuous Space"☆11Sep 16, 2019Updated 6 years ago
- EEG-MI signal classification DL model.☆14Apr 26, 2024Updated last year
- Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.☆13Jan 5, 2023Updated 3 years ago
- Split bib files for anthology bibliography for overleaf☆11Aug 25, 2024Updated last year
- codebase for the Text-based NP Enrichment (TNE) paper☆19Mar 12, 2024Updated last year
- Detail-Sensitive Panoramic Annular Semantic Segmentation☆12May 19, 2022Updated 3 years ago
- 利用Bert获取中文字、词向量☆10Jan 18, 2022Updated 4 years ago
- Demo server for TREC LiveQA competition☆11Dec 7, 2016Updated 9 years ago
- Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"☆18May 15, 2025Updated 9 months ago
- PANiC - PAraphrasing Noun-Compounds☆15Apr 6, 2018Updated 7 years ago
- Event based Sign-Language-Translation☆18Jan 6, 2026Updated last month
- ACL Paper Lists(machine translation)☆13Mar 23, 2022Updated 3 years ago
- The figures for the Deep Learning textbook (www.deeplearningbook.org)☆17Oct 9, 2017Updated 8 years ago
- PathPiece tokenizer☆13Nov 10, 2024Updated last year
- ☆12Jun 24, 2019Updated 6 years ago
- ☆13Apr 16, 2021Updated 4 years ago
- 中文医疗NLP领域 数据集,论文 ,知识图谱,语料,工具包☆12Oct 15, 2020Updated 5 years ago
- BERT模型的分类使用☆15Apr 4, 2020Updated 5 years ago
- Find informative examples to efficiently (human)-evaluate NLG models.☆18Feb 6, 2026Updated last week
- 🔄 ASCII / IPA conversion for Typst☆22Jan 8, 2026Updated last month
- (NAACL 2024) Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations☆15Apr 14, 2025Updated 10 months ago
- 中文实体抽取☆14Aug 25, 2018Updated 7 years ago
- MudBlazor Template dotnet new mudblazor --interactivity Auto --auth Individual --all-interactive, installed with Secure API based on role…☆17Oct 9, 2024Updated last year
- 使用fastNLP架构简单利用Bert-Bi-LSTM-CRF实现中文NER☆15Sep 25, 2020Updated 5 years ago