StarDewXXX / O1-Pruner
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
☆67Updated 2 months ago
Alternatives and similar repositories for O1-Pruner:
Users that are interested in O1-Pruner are comparing it to the libraries listed below
- ☆93Updated last month
- ☆54Updated last week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆191Updated last month
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆133Updated last month
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆75Updated last week
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆115Updated last month
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆65Updated 2 months ago
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆36Updated last week
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆63Updated last month
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆95Updated last month
- ☆93Updated last month
- ☆60Updated this week
- Code for "Reasoning to Learn from Latent Thoughts"☆91Updated 3 weeks ago
- [ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"☆44Updated 4 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆56Updated 2 months ago
- ☆55Updated 6 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆175Updated last month
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆72Updated this week
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆68Updated last month
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 5 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆109Updated last year
- Test-time preferenece optimization.☆114Updated 2 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆143Updated last month
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆56Updated last month
- ☆125Updated 3 weeks ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆94Updated last week
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 3 months ago
- ☆149Updated 4 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆62Updated this week
- ☆157Updated 3 weeks ago