cimeister / tokenizer-analysis-suiteLinks
☆40Updated last week
Alternatives and similar repositories for tokenizer-analysis-suite
Users that are interested in tokenizer-analysis-suite are comparing it to the libraries listed below
Sorting:
- Simple-to-use scoring function for arbitrarily tokenized texts.☆47Updated 9 months ago
- A Python library that encapsulates various methods for neuron interpretation and analysis in Deep NLP models.☆105Updated 2 years ago
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆60Updated last year
- ☆65Updated 2 years ago
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Updated 7 months ago
- A python package to run inference with HuggingFace language and vision-language checkpoints wrapping many convenient features.☆28Updated last year
- Collection of academic works in natural language processing, computational linguistics, and computational cognitive science that study th…☆22Updated last year
- State-of-the-art paired encoder and decoder models (17M-1B params)☆53Updated 3 months ago
- The geometry of multilingual language model representations (EMNLP 2022).☆22Updated 3 years ago
- ☆21Updated 2 months ago
- One-stop shop for running and fine-tuning transformer-based language models for retrieval☆60Updated 2 weeks ago
- Rust library for indexing and quickly searching large pretraining corpora☆30Updated last month
- A survey of corpora for Germanic low-resource languages and dialects☆26Updated 11 months ago
- Code for SaGe subword tokenizer (EACL 2023)☆27Updated 11 months ago
- A Python package to compute HONEST, a score to measure hurtful sentence completions in language models. Published at NAACL 2021.☆20Updated 7 months ago
- [NAACL 2022] GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers☆21Updated 2 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆54Updated 3 years ago
- Repository collecting resources and best practices to improve experimental rigour in deep learning research.☆27Updated 2 years ago
- Code associated with the paper "Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists"☆50Updated 3 years ago
- Measuring the Mixing of Contextual Information in the Transformer☆33Updated 2 years ago
- ☆36Updated 2 years ago
- Utilities for the HuggingFace transformers library☆72Updated 2 years ago
- ☆101Updated 2 years ago
- A Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation, Levy et al., Findings of EMNLP 2021☆14Updated 3 years ago
- ☆220Updated 3 months ago
- Utility for behavioral and representational analyses of Language Models☆171Updated 2 months ago
- The evaluation pipeline for the 2024 BabyLM Challenge.☆33Updated last year
- Public repository for SemEval 2023 - Task 10 - Explainable Detection of Online Sexism (EDOS)☆25Updated 2 years ago
- Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale, TACL (2022)☆133Updated 5 months ago
- Code of NAACL 2022 "Efficient Hierarchical Domain Adaptation for Pretrained Language Models" paper.☆32Updated 2 years ago