aryopg / mmlu-redux
☆11Updated 3 months ago
Alternatives and similar repositories for mmlu-redux:
Users that are interested in mmlu-redux are comparing it to the libraries listed below
- official implementation of paper "Process Reward Model with Q-value Rankings"☆48Updated 2 weeks ago
- ☆41Updated 9 months ago
- ☆23Updated 5 months ago
- Lottery Ticket Adaptation☆37Updated 2 months ago
- Long Context Extension and Generalization in LLMs☆48Updated 4 months ago
- o1 Chain of Thought Examples☆33Updated 4 months ago
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Updated 11 months ago
- Codebase for Instruction Following without Instruction Tuning☆33Updated 4 months ago
- This is official project in our paper: Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers☆29Updated last year
- ☆58Updated this week
- Scalable Meta-Evaluation of LLMs as Evaluators☆43Updated last year
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated 11 months ago
- ☆20Updated 8 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆48Updated 2 months ago
- CodeUltraFeedback: aligning large language models to coding preferences☆68Updated 7 months ago
- ☆55Updated 3 months ago
- ☆95Updated 7 months ago
- Critique-out-Loud Reward Models☆51Updated 4 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆80Updated 11 months ago
- ☆53Updated 4 months ago
- ☆39Updated 6 months ago
- Benchmarking Benchmark Leakage in Large Language Models☆49Updated 8 months ago
- ☆92Updated 3 weeks ago
- The official repository of the Omni-MATH benchmark.☆71Updated last month
- Code for paper "Merging Multi-Task Models via Weight-Ensembling Mixture of Experts"☆18Updated 8 months ago
- ☆12Updated last month
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆32Updated 3 months ago