ysy-phoenix / evalhubLinks
All-in-one benchmarking platform for evaluating LLM.
☆15Updated 3 months ago
Alternatives and similar repositories for evalhub
Users that are interested in evalhub are comparing it to the libraries listed below
Sorting:
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]☆61Updated 4 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆289Updated 3 months ago
- ☆50Updated 5 months ago
- Reproducing R1 for Code with Reliable Rewards☆286Updated 9 months ago
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter☆131Updated 2 months ago
- Spectral Sphere Optimizer☆94Updated 3 weeks ago
- KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches. EMNLP Findings 2024☆88Updated 11 months ago
- ☆129Updated 8 months ago
- Async pipelined version of Verl☆124Updated 10 months ago
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆18Updated 9 months ago
- A lightweight Inference Engine built for block diffusion models☆40Updated 2 months ago
- ☆53Updated 8 months ago
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆191Updated last week
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution☆104Updated 4 months ago
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆87Updated 2 months ago
- ☆42Updated 10 months ago
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation☆248Updated last year
- ☆55Updated 7 months ago
- LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification☆73Updated 6 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆229Updated last year
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆192Updated 4 months ago
- [ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆55Updated 9 months ago
- ☆20Updated 4 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆116Updated 6 months ago
- ☆111Updated 4 months ago
- NexRL is an ultra-loosely-coupled LLM post-training framework.☆97Updated last week
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆220Updated 8 months ago
- Source code for the paper "LongGenBench: Long-context Generation Benchmark"☆24Updated last year
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling☆49Updated 6 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆120Updated last year