Aleph-Alpha-Research / eval-frameworkLinks
Comprehensive LLM evaluation at scale: A production-ready framework for evaluating large language models across multiple benchmarks.
☆36Updated last week
Alternatives and similar repositories for eval-framework
Users that are interested in eval-framework are comparing it to the libraries listed below
Sorting:
- Measuring the Mixing of Contextual Information in the Transformer☆34Updated 2 years ago
- Official Code for M-RᴇᴡᴀʀᴅBᴇɴᴄʜ: Evaluating Reward Models in Multilingual Settings (ACL 2025 Main)☆40Updated 8 months ago
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"☆36Updated 8 months ago
- ☆45Updated last year
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆60Updated last year
- Resources for cultural NLP research☆113Updated 4 months ago
- A repository containing the code for translating popular LLM benchmarks to German.☆31Updated 2 years ago
- Benchmark API for Multidomain Language Modeling☆25Updated 3 years ago
- This repository contains an extension of fairseq for pixel / visual representations of text for machine translation.☆37Updated 2 years ago
- A software for transferring pre-trained English models to foreign languages☆19Updated 2 years ago
- Code for Zero-Shot Tokenizer Transfer☆142Updated last year
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects☆23Updated last year
- MAFAND-MT☆60Updated last year
- A library for parameter-efficient and composable transfer learning for NLP with sparse fine-tunings.☆75Updated last year
- Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.☆125Updated 3 months ago
- ☆20Updated last year
- ☆263Updated 6 months ago
- List of all the resources I developed in collaboration with LSV and Masakhane during my doctoral studies and beyond☆12Updated 3 years ago
- [NAACL 2022] GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers☆21Updated 2 years ago
- Automatic metrics for GEM tasks☆67Updated 3 years ago
- A curated list of research papers and resources on Cultural LLM.☆53Updated last year
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆87Updated last year
- ☆83Updated 11 months ago
- ☆87Updated last year
- ☆103Updated last year
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆107Updated last year
- Official implementation of "GPT or BERT: why not both?"☆61Updated 6 months ago
- Codebase describing experiments in Truncation Sampling as Language Model Desmoothing☆13Updated 3 years ago
- Simple-to-use scoring function for arbitrarily tokenized texts.☆47Updated 11 months ago
- ☆132Updated 2 weeks ago