MeLeLBGU / tokenizers_intrinsic_benchmarkLinks
Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"
☆10Updated 10 months ago
Alternatives and similar repositories for tokenizers_intrinsic_benchmark
Users that are interested in tokenizers_intrinsic_benchmark are comparing it to the libraries listed below
Sorting:
- Diagnostic tests for linguistic capacities in language models☆65Updated 3 years ago
- Automated Semantic Analysis of Discourse Markers☆10Updated 3 years ago
- A simple library for querying the URIEL typological database.☆90Updated last year
- A neural word aligner based on multilingual BERT☆357Updated 3 years ago
- ☆230Updated 4 years ago
- Utility for behavioral and representational analyses of Language Models☆162Updated 2 weeks ago
- Repository for DISRPT2023 shared task☆17Updated last year
- ☆54Updated 3 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆376Updated last year
- A tool for holistic analysis of language generations systems☆472Updated 2 weeks ago
- Easier Automatic Sentence Simplification Evaluation☆161Updated 2 years ago
- ☆15Updated 3 years ago
- Find informative examples to efficiently (human)-evaluate NLG models.☆16Updated this week
- This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs☆184Updated 2 years ago
- ☆33Updated last month
- This repository houses the IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of >25k semiautomatically generated se…☆19Updated 4 years ago
- Dump the text of the Gigaword dataset into a single file, for use with language modeling (and other!) toolkits☆23Updated 8 years ago
- Efficient Low-Memory Aligner☆146Updated 8 months ago
- a tool for calcualting character n-gram F score☆74Updated 2 years ago
- The Benchmark of Linguistic Minimal Pairs☆154Updated 2 years ago
- MT Evaluation in Many Languages via Zero-Shot Paraphrasing☆102Updated last year
- An initiative to collect and distribute resources for co-reference resolution in a unified standard.☆25Updated last year
- Code and data for "A fine-grained comparison of pragmatic language understanding in humans and language models"☆11Updated 2 years ago
- [Kauf & Ivanova, ACL 2023] A Better Way to Do Masked Language Model Scoring☆10Updated last year
- PyTorch source code of NAACL 2021 paper "Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Tran…☆18Updated 2 years ago
- Neural CRF Model for Sentence Alignment in Text Simplification☆68Updated 8 months ago
- Lexical Substitution Framework☆46Updated 2 years ago
- Appraise code used as part of WMT21 human evaluation campaign☆28Updated last week
- ☆10Updated 3 years ago
- A tool that locates, downloads, and extracts machine translation corpora☆158Updated 3 weeks ago