ysy-phoenix / evalhubLinks
All-in-one benchmarking platform for evaluating LLM.
☆15Updated 3 weeks ago
Alternatives and similar repositories for evalhub
Users that are interested in evalhub are comparing it to the libraries listed below
Sorting:
- ☆9Updated last month
- Reproducing R1 for Code with Reliable Rewards☆246Updated 3 months ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [arXiv '25]☆45Updated last month
- ☆29Updated this week
- Async pipelined version of Verl☆112Updated 4 months ago
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆81Updated 5 months ago
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆74Updated last week
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆205Updated last year
- ☆114Updated 2 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆111Updated 8 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆100Updated last week
- Source code for the paper "LongGenBench: Long-context Generation Benchmark"☆22Updated 10 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆30Updated 6 months ago
- A version of verl to support tool use☆315Updated this week
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆17Updated 3 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆218Updated 5 months ago
- ☆40Updated 2 months ago
- ☆263Updated 2 months ago
- ☆26Updated 6 months ago
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆162Updated last week
- ☆55Updated 8 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆241Updated last year
- ☆71Updated 8 months ago
- [ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆40Updated 3 months ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆200Updated 6 months ago
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆34Updated this week
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆245Updated 3 months ago
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆144Updated this week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆237Updated 2 months ago
- LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation☆26Updated last month