guijinSON / MM-EvalLinks
Official implementation for "MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models"
☆14Updated 9 months ago
Alternatives and similar repositories for MM-Eval
Users that are interested in MM-Eval are comparing it to the libraries listed below
Sorting:
- Official Code for M-RᴇᴡᴀʀᴅBᴇɴᴄʜ: Evaluating Reward Models in Multilingual Settings (ACL 2025 Main)☆33Updated 2 months ago
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision☆93Updated 9 months ago
- [ICLR 2022] Towards Continual Knowledge Learning of Language Models☆92Updated 2 years ago
- ☆184Updated last month
- Code for the paper "Reasoning Models Better Express Their Confidence"☆17Updated 2 months ago
- ☆62Updated 2 years ago
- ☆52Updated 2 years ago
- [EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models☆73Updated last year
- FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback☆12Updated 3 years ago
- ☆96Updated last year
- Multilingual Large Language Models Evaluation Benchmark☆128Updated 11 months ago
- ☆22Updated 2 years ago
- ☆87Updated 2 years ago
- Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"☆12Updated 4 months ago
- [ICLR 2025] General-purpose activation steering library☆88Updated 2 weeks ago
- AI Logging for Interpretability and Explainability🔬☆125Updated last year
- The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…☆94Updated 3 years ago
- The geometry of multilingual language model representations (EMNLP 2022).☆21Updated 2 years ago
- [NeurIPS 2024] How do Large Language Models Handle Multilingualism?☆37Updated 9 months ago
- Code for "Tracing Knowledge in Language Models Back to the Training Data"☆38Updated 2 years ago
- Repository for research in the field of Responsible NLP at Meta.☆202Updated 2 months ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆78Updated last year
- Measuring the Mixing of Contextual Information in the Transformer☆31Updated 2 years ago
- ☆76Updated last year
- ☆23Updated last year
- Code for Zero-Shot Tokenizer Transfer☆135Updated 6 months ago
- ☆27Updated last year
- ☆75Updated last year
- Simple and scalable tools for data-driven pretraining data selection.☆25Updated 2 months ago
- ☆11Updated last year