ysy-phoenix / evalhub
All-in-one benchmarking platform for evaluating LLM.
☆11Updated this week
Alternatives and similar repositories for evalhub:
Users that are interested in evalhub are comparing it to the libraries listed below
- Reproducing R1 for Code with Reliable Rewards☆140Updated 3 weeks ago
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆67Updated this week
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆17Updated last month
- 🔥 How to efficiently and effectively compress the CoTs or directly generate concise CoTs during inference while maintaining the reasonin…☆23Updated this week
- Evaluation utilities based on SymPy.☆16Updated 3 months ago
- ☆20Updated this week
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆67Updated last month
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆99Updated 3 months ago
- ☆39Updated 4 months ago
- A Comprehensive Survey on Long Context Language Modeling☆113Updated this week
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆23Updated last month
- Curation of resources for LLM research, screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise de…☆49Updated 8 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆166Updated last week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆162Updated last week
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆78Updated 2 weeks ago
- Paper list for Efficient Reasoning.☆331Updated this week
- The blog, read report and code example for AGI/LLM related knowledge.☆36Updated last month
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆89Updated 2 weeks ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆103Updated 2 weeks ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆191Updated 11 months ago
- ☆72Updated this week
- A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enab…☆65Updated last month
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆75Updated 9 months ago
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆171Updated this week
- Multi-Candidate Speculative Decoding☆35Updated 11 months ago
- ☆47Updated 3 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆170Updated 3 weeks ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆176Updated last month
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆179Updated 7 months ago
- a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation☆45Updated last week