Guangxuan-Xiao / GSM8K-evalLinks
โ55Updated 2 years ago
Alternatives and similar repositories for GSM8K-eval
Users that are interested in GSM8K-eval are comparing it to the libraries listed below
Sorting:
- โ71Updated 8 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ270Updated last year
- โ77Updated last year
- [COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Freeโ50Updated 9 months ago
- โ219Updated 9 months ago
- โ201Updated 2 weeks ago
- A Sober Look at Language Model Reasoningโ92Updated last month
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".โ114Updated 5 months ago
- [NeurIPS'24] Official code for *๐ฏDART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*โ119Updated last year
- Repo of paper "Free Process Rewards without Process Labels"โ168Updated 9 months ago
- [NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"โ146Updated 2 months ago
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.โ253Updated last week
- Model merging is a highly efficient approach for long-to-short reasoning.โ96Updated 2 months ago
- [EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMsโ197Updated last month
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)โ127Updated last year
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.โ134Updated 9 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factualityโ226Updated last year
- [ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"โ93Updated last year
- ๐ Paper list on decoding methods for LLMs and LVLMsโ66Updated 2 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimizationโ94Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)โ151Updated 10 months ago
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!โ71Updated 9 months ago
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruningโ96Updated 10 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.โ407Updated 6 months ago
- โ215Updated 10 months ago
- [AI4MATH@ICML2025] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMsโ41Updated 7 months ago
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learningโ50Updated 6 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied witโฆโ150Updated last year
- โ46Updated 3 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningโ257Updated 7 months ago