yafuly / TPO
☆106Updated last month
Alternatives and similar repositories for TPO:
Users that are interested in TPO are comparing it to the libraries listed below
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆131Updated last month
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆73Updated 2 months ago
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆64Updated last month
- Reformatted Alignment☆115Updated 6 months ago
- ☆48Updated last month
- ☆49Updated last year
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆148Updated last week
- The official repository of the Omni-MATH benchmark.☆77Updated 3 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆95Updated 2 months ago
- The demo, code and data of FollowRAG☆70Updated 3 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆60Updated 5 months ago
- ☆166Updated last month
- This the implementation of LeCo☆32Updated 2 months ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆98Updated 2 weeks ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆104Updated last week
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆84Updated last month
- The official code repository for PRMBench.☆68Updated last month
- official implementation of paper "Process Reward Model with Q-value Rankings"☆51Updated last month
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆62Updated last month
- ☆83Updated 2 weeks ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 3 months ago
- A Survey on Efficient Reasoning for LLMs☆116Updated this week
- ☆54Updated 5 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆138Updated 2 weeks ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆73Updated 9 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆107Updated 11 months ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆116Updated 4 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
- A Comprehensive Survey on Long Context Language Modeling☆86Updated last week
- Large Language Models Can Self-Improve in Long-context Reasoning☆67Updated 4 months ago