ysy-phoenix / evalhubLinks
All-in-one benchmarking platform for evaluating LLM.
☆15Updated this week
Alternatives and similar repositories for evalhub
Users that are interested in evalhub are comparing it to the libraries listed below
Sorting:
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆17Updated last month
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [arXiv '25]☆37Updated 3 weeks ago
- Reproducing R1 for Code with Reliable Rewards☆208Updated last month
- ☆95Updated 2 weeks ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆84Updated last year
- [ICLR 2025 Workshop] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆19Updated last week
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆95Updated 2 months ago
- [ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆31Updated last month
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆78Updated 3 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆28Updated 3 months ago
- ☆231Updated last week
- A version of verl to support tool use☆172Updated this week
- Async pipelined version of Verl☆91Updated 2 months ago
- A Sober Look at Language Model Reasoning☆63Updated last week
- ☆63Updated 6 months ago
- A Comprehensive Survey on Long Context Language Modeling☆147Updated this week
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆186Updated 3 months ago
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆69Updated last week
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling☆29Updated this week
- ☆69Updated 6 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆220Updated last year
- ☆75Updated 3 months ago
- 🔥 A minimal training framework for scaling FLA models☆146Updated 3 weeks ago
- ☆64Updated last month
- Repo of paper "Free Process Rewards without Process Labels"☆149Updated 2 months ago
- ☆210Updated 2 weeks ago
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆107Updated this week
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆195Updated 10 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆136Updated 8 months ago
- A collection of papers on discrete diffusion models☆121Updated last week