ysy-phoenix / evalhubLinks
All-in-one benchmarking platform for evaluating LLM.
☆15Updated this week
Alternatives and similar repositories for evalhub
Users that are interested in evalhub are comparing it to the libraries listed below
Sorting:
- ☆9Updated last week
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆17Updated 2 months ago
- Reproducing R1 for Code with Reliable Rewards☆221Updated last month
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆202Updated 3 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆94Updated 3 months ago
- A Sober Look at Language Model Reasoning☆74Updated last week
- ☆104Updated 3 weeks ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆108Updated 6 months ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [arXiv '25]☆39Updated last month
- A version of verl to support tool use☆261Updated this week
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆73Updated last week
- Evaluation utilities based on SymPy.☆20Updated 6 months ago
- [ICLR 2025 Workshop] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆25Updated last week
- Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"☆85Updated this week
- ☆71Updated 7 months ago
- A Framework for LLM-based Multi-Agent Reinforced Training and Inference☆140Updated 2 weeks ago
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆79Updated 4 months ago
- ☆228Updated last month
- ☆30Updated last month
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆28Updated 4 months ago
- Curation of resources for LLM research, screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise de…☆55Updated 11 months ago
- ☆43Updated this week
- ☆76Updated 4 months ago
- ☆58Updated last week
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆85Updated last week
- Repo of paper "Free Process Rewards without Process Labels"☆154Updated 3 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning