iamhankai / Forest-of-Thought
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
☆38Updated 2 months ago
Alternatives and similar repositories for Forest-of-Thought:
Users that are interested in Forest-of-Thought are comparing it to the libraries listed below
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆102Updated last week
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆71Updated 3 weeks ago
- ☆107Updated 2 weeks ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆169Updated 3 weeks ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 2 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆99Updated last month
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆90Updated last month
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆75Updated 3 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆180Updated last month
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆60Updated this week
- ☆91Updated last month
- ☆62Updated 4 months ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆110Updated this week
- ☆184Updated last month
- ☆101Updated 4 months ago
- ☆35Updated last month
- ☆118Updated 10 months ago
- ☆78Updated this week
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆68Updated 3 weeks ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆222Updated 2 weeks ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆42Updated 9 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆90Updated last week
- Deepseek R1 zero tiny version own reproduce on two A100s.☆58Updated 2 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆54Updated 2 months ago
- ☆106Updated 2 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆141Updated last month
- A comprehensive collection of process reward models.☆53Updated last week
- Reformatted Alignment☆115Updated 6 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 3 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆136Updated 2 months ago