sastpg / RFTTLinks
RFTT: Reasoning with Reinforced Functional Token Tuning
☆27Updated 2 months ago
Alternatives and similar repositories for RFTT
Users that are interested in RFTT are comparing it to the libraries listed below
Sorting:
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆69Updated 3 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆65Updated last month
- A Sober Look at Language Model Reasoning☆63Updated last week
- The official code of paper “Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning”☆117Updated this week
- ☆46Updated 7 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆123Updated 2 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆54Updated 6 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆73Updated this week
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆22Updated 3 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆103Updated this week
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆95Updated 2 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆94Updated 3 months ago
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆54Updated 2 weeks ago
- The official code repository for PRMBench.☆73Updated 3 months ago
- Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?☆28Updated last month
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆70Updated 2 months ago
- ☆131Updated 3 weeks ago
- A version of verl to support tool use☆172Updated this week
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆36Updated 2 weeks ago
- A comprehensive collection of process reward models.☆88Updated 2 weeks ago
- [ICLR 2025 Workshop] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆19Updated last week
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆69Updated last week
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆35Updated 2 months ago
- ☆22Updated 11 months ago
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆58Updated 2 months ago
- ☆231Updated last week
- Repo of paper "Free Process Rewards without Process Labels"☆149Updated 2 months ago
- ☆45Updated 3 months ago
- ☆19Updated 3 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆98Updated last month