satori-reasoning/Satori

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/satori-reasoning/Satori)

satori-reasoning / Satori

[ICML 2025] Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

☆115

Alternatives and similar repositories for Satori

Users that are interested in Satori are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

satori-reasoning / Satori-SWE
View on GitHub
☆21May 30, 2025Updated last year
QizhiPei / MathFusion
View on GitHub
MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)
☆37Jul 16, 2025Updated last year
MasterVito / SwS
View on GitHub
Official Repo for SwS: A Weakness-driven Problem Synthesis Framework in RL for LLM Reasoning
☆42Nov 11, 2025Updated 8 months ago
Gen-Verse / ReasonFlux
View on GitHub
[NeurIPS 2025 Spotlight] LLM post-training suite — featuring ReasonFlux, ReasonFlux-PRM, and ReasonFlux-Coder.
☆540Sep 27, 2025Updated 9 months ago
brendanhogan / completion_tree_view
View on GitHub
☆15Apr 26, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Simplified-Reasoning / TPO
View on GitHub
Test-time preferenece optimization (ICML 2025).
☆185May 8, 2025Updated last year
ars22 / e3
View on GitHub
☆20Sep 16, 2025Updated 10 months ago
sparkle-reasoning / sparkle
View on GitHub
[NeurIPS'25] Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
☆16Dec 12, 2025Updated 7 months ago
TIGER-AI-Lab / Hierarchical-Reasoner
View on GitHub
Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning [ICLR26]
☆64Apr 11, 2026Updated 3 months ago
PRIME-RL / PRIME
View on GitHub
Scalable RL solution for advanced reasoning of language models
☆1,865Mar 18, 2025Updated last year
nishadsinghi / sc-genrm-scaling
View on GitHub
[COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…
☆15Oct 31, 2025Updated 8 months ago
InternLM / OREAL
View on GitHub
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
☆190Mar 20, 2025Updated last year
zhaoxlpku / PromptCoT
View on GitHub
☆17Apr 10, 2025Updated last year
GAIR-NLP / LIMO
View on GitHub
[COLM 2025] LIMO: Less is More for Reasoning
☆1,080Jul 30, 2025Updated 11 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
stellalisy / PrefPalette
View on GitHub
☆21Apr 3, 2026Updated 3 months ago
MasterVito / DAC-RL
View on GitHub
Official Repo for DAC-RL: Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability
☆16Feb 26, 2026Updated 4 months ago
complex-reasoning / RPG
View on GitHub
[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)
☆76Jun 29, 2026Updated 3 weeks ago
michaelchen-lab / caft-llm
View on GitHub
Improving large language models with concept-aware fine-tuning (CAFT)
☆29Jan 31, 2026Updated 5 months ago
rookie-joe / AutoPSV
View on GitHub
☆50Oct 28, 2024Updated last year
Asap7772 / understanding-rlhf
View on GitHub
Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…
☆32Apr 20, 2024Updated 2 years ago
cmu-l3 / l1
View on GitHub
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆263May 14, 2025Updated last year
WPR001 / Ego-ST
View on GitHub
☆16Sep 25, 2025Updated 9 months ago
LAMDA-NeSy / Self-Backtracking
View on GitHub
☆52Feb 12, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Simplified-Reasoning / LUFFY
View on GitHub
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆460Mar 20, 2026Updated 4 months ago
esteng / regal_program_learning
View on GitHub
☆27Sep 11, 2024Updated last year
ericjiang18 / EnergyORM
View on GitHub
☆15Jun 5, 2025Updated last year
Raibows / CREAM
View on GitHub
Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.
☆29Feb 17, 2025Updated last year
hkust-nlp / simpleRL-reason
View on GitHub
Simple RL training for reasoning
☆3,870Dec 23, 2025Updated 7 months ago
HKUNLP / critic-rl
View on GitHub
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆127May 6, 2025Updated last year
NJUDeepEngine / CAEF
View on GitHub
Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"
☆11Oct 11, 2024Updated last year
PRIME-RL / Entropy-Mechanism-of-RL
View on GitHub
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆444Jul 11, 2025Updated last year
UmeanNever / RankSurprisalRatio
View on GitHub
[ACL 2026 Main] Official Repo for Paper "Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Ali…
☆17Jul 1, 2026Updated 3 weeks ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
TianduoWang / DPO-ST
View on GitHub
[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
☆54Jul 28, 2024Updated last year
KbsdJames / omni-math-rule
View on GitHub
The rule-based evaluation subset and code implementation of Omni-MATH
☆28Dec 23, 2024Updated last year
shunzh / mcts-for-llm
View on GitHub
This is a pip package implementing Reinforcement Learning algorithms in non-stationary environments supported by the OpenAI Gym toolkit.
☆16Jun 28, 2024Updated 2 years ago
RenzeLou / Muffin
View on GitHub
MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following
☆16Oct 31, 2024Updated last year
hkust-nlp / B-STaR
View on GitHub
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆86May 21, 2025Updated last year
zjuchenlong / WSAG
View on GitHub
[EMNLP'22] Weakly-Supervised Temporal Article Grounding
☆14Nov 25, 2023Updated 2 years ago
SalesforceAIResearch / LaTRO
View on GitHub
☆127Jun 2, 2026Updated last month