cimeister / tokenizer-analysis-suiteLinks
☆30Updated last week
Alternatives and similar repositories for tokenizer-analysis-suite
Users that are interested in tokenizer-analysis-suite are comparing it to the libraries listed below
Sorting:
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆58Updated last year
- Simple-to-use scoring function for arbitrarily tokenized texts.☆46Updated 6 months ago
- German Language Understanding Evaluation Benchmark @NAACL24☆15Updated last month
- ☆66Updated 2 years ago
- The evaluation pipeline for the 2024 BabyLM Challenge.☆33Updated 9 months ago
- Find informative examples to efficiently (human)-evaluate NLG models.☆16Updated 3 weeks ago
- A python package to run inference with HuggingFace language and vision-language checkpoints wrapping many convenient features.☆28Updated 11 months ago
- A software for transferring pre-trained English models to foreign languages☆18Updated 2 years ago
- A Python utility for indexing file lines. Best demo honourable mention at ECIR 2024.☆23Updated last year
- A Python library that encapsulates various methods for neuron interpretation and analysis in Deep NLP models.☆104Updated last year
- Rust library for indexing and quickly searching large pretraining corpora☆28Updated last week
- Collection of academic works in natural language processing, computational linguistics, and computational cognitive science that study th…☆21Updated last year
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Updated 4 months ago
- One-stop shop for running and fine-tuning transformer-based language models for retrieval☆59Updated this week
- ☆19Updated 2 weeks ago
- Code associated with the paper "Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists"☆49Updated 3 years ago
- A survey of corpora for Germanic low-resource languages and dialects☆25Updated 9 months ago
- Repository collecting resources and best practices to improve experimental rigour in deep learning research.☆27Updated 2 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆54Updated 2 years ago
- State-of-the-art paired encoder and decoder models (17M-1B params)☆44Updated 3 weeks ago
- [NAACL 2022] GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers☆21Updated 2 years ago
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"☆33Updated 2 months ago
- Utilities for the HuggingFace transformers library☆70Updated 2 years ago
- CD20200004 from 01/01/2021 to 31/12/2023 - LIG UGA - Python Notebook and Models for the MT Lab @ ALPS 2022☆13Updated last year
- Measuring the Mixing of Contextual Information in the Transformer☆31Updated 2 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆26Updated 9 months ago
- Query-focused summarization data☆42Updated 2 years ago
- Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale, TACL (2022)☆128Updated 2 months ago
- ☆53Updated last year
- GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific way☆14Updated last month