MilkThink-Lab / RouterEvalLinks
A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in Large Language Models
☆44Updated 3 months ago
Alternatives and similar repositories for RouterEval
Users that are interested in RouterEval are comparing it to the libraries listed below
Sorting:
- A curated list of awesome works in Routing LLMs paradigm (👉 Welcome to submit your contributions to this code repository)☆40Updated last month
- A Comprehensive Survey on Long Context Language Modeling☆152Updated 3 weeks ago
- ☆63Updated 7 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆241Updated 2 months ago
- An Open Math Pre-trainng Dataset with 370B Tokens.☆89Updated 2 months ago
- ☆103Updated 6 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆113Updated 2 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆75Updated 3 weeks ago
- ☆152Updated last month
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆189Updated 3 weeks ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆73Updated 4 months ago
- ☆67Updated 3 weeks ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆133Updated last year
- An Awesome List of Reinforcement Learning-based Large Language Agent Works. Collect directly from official code base.☆154Updated this week
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆240Updated 3 weeks ago
- ☆142Updated 11 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆250Updated 6 months ago
- ☆53Updated last week
- ☆191Updated 2 months ago
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆244Updated 7 months ago
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆96Updated 2 months ago
- ☆273Updated 3 weeks ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆216Updated 4 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆222Updated last month
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆105Updated last month
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆141Updated this week
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆131Updated 2 months ago
- ☆150Updated last week
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆251Updated 3 weeks ago
- Reproducing R1 for Code with Reliable Rewards☆221Updated last month