MilkThink-Lab / RouterEval
A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in Large Language Models
☆38Updated last month
Alternatives and similar repositories for RouterEval:
Users that are interested in RouterEval are comparing it to the libraries listed below
- A curated list of awesome works in Routing LLMs paradigm (👉 Welcome to submit your contributions to this code repository)☆30Updated last month
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆229Updated last week
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆173Updated last month
- A Comprehensive Survey on Long Context Language Modeling☆131Updated last month
- An Open Math Pre-trainng Dataset with 370B Tokens.☆72Updated 3 weeks ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆75Updated last week
- ☆63Updated 5 months ago
- ☆146Updated last month
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆252Updated last year
- The RedStone repository includes code for preparing extensive datasets used in training large language models.☆131Updated 2 months ago
- a-m-team's exploration in large language modeling☆49Updated 3 weeks ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆148Updated 7 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆236Updated last week
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆87Updated 3 weeks ago
- On Memorization of Large Language Models in Logical Reasoning☆65Updated 3 weeks ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆64Updated last week
- ☆187Updated 2 months ago
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.☆143Updated last week
- ☆283Updated last month
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆240Updated 5 months ago
- ☆125Updated 3 weeks ago
- ☆149Updated 4 months ago
- ☆23Updated last month
- ☆41Updated this week
- ☆130Updated 3 months ago
- Reproducing R1 for Code with Reliable Rewards☆179Updated this week
- Test-time preferenece optimization.☆114Updated 2 months ago
- ☆267Updated 9 months ago
- ☆101Updated 4 months ago
- ☆45Updated 2 weeks ago