ysy-phoenix / evalhub
All-in-one benchmarking platform for evaluating LLM.
☆15Updated 2 weeks ago
Alternatives and similar repositories for evalhub
Users that are interested in evalhub are comparing it to the libraries listed below
Sorting:
- Reproducing R1 for Code with Reliable Rewards☆190Updated last week
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [arXiv '25]☆36Updated this week
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆26Updated 3 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆93Updated 2 months ago
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆78Updated 2 months ago
- ☆82Updated last week
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆105Updated 5 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆179Updated 2 months ago
- [ICLR 2025 Workshop] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆18Updated 3 weeks ago
- Async pipelined version of Verl☆78Updated last month
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆193Updated 9 months ago
- ☆168Updated last month
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆80Updated last week
- ☆61Updated last month
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆17Updated 3 weeks ago
- ☆28Updated last month
- ☆66Updated 5 months ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆83Updated 11 months ago
- ☆70Updated 2 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆236Updated last month
- LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification☆53Updated 2 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆202Updated this week
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆54Updated last week
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆74Updated 2 months ago
- Evaluation utilities based on SymPy.☆18Updated 5 months ago
- FR-Spec: Frequency-Ranked Speculative Sampling☆22Updated last month
- Simple extension on vLLM to help you speed up reasoning model without training.☆150Updated 2 weeks ago
- ☆196Updated 2 months ago
- A Comprehensive Survey on Long Context Language Modeling☆142Updated last month
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆69Updated last month