Interpretable Contrastive Monte Carlo Tree Search Reasoning
☆51Nov 9, 2024Updated last year
Alternatives and similar repositories for SC-MCTS
Users that are interested in SC-MCTS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official Repo for SvS: A Self-play with Variational Problem Synthesis strategy for RLVR training☆54Dec 13, 2025Updated 4 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆126May 6, 2025Updated 11 months ago
- LCA-on-the-line (ICML 2024 Oral)☆14Feb 13, 2025Updated last year
- ☆70Jun 18, 2025Updated 10 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆66Oct 18, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆702Jan 20, 2025Updated last year
- This repository contains the replication of the iGSM dataset generation process from the Physics of LLM paper by Zeyuan Zhu.☆17Sep 13, 2024Updated last year
- [NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"☆31Mar 30, 2026Updated last month
- ☆341Jun 5, 2025Updated 10 months ago
- Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence☆61Nov 11, 2025Updated 5 months ago
- Use the tokenizer in parallel to achieve superior acceleration☆20Mar 21, 2024Updated 2 years ago
- ☆35Jun 5, 2025Updated 10 months ago
- Large Reasoning Models☆805Dec 3, 2024Updated last year
- The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”☆17Feb 26, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [NeurIPS 2024] Can Language Models Learn to Skip Steps?☆22Jan 25, 2025Updated last year
- official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"☆60Dec 20, 2023Updated 2 years ago
- Code release for "Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search" published at NeurIPS '24.☆18Feb 21, 2025Updated last year
- [ICLR 2026] Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding☆32Jan 27, 2026Updated 3 months ago
- ☆23Jul 5, 2024Updated last year
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆208Mar 4, 2025Updated last year
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆69May 31, 2024Updated last year
- Estimating hardware and cloud costs of LLMs and transformer projects☆21Apr 1, 2026Updated 3 weeks ago
- [COLM 2025: 1st Workshop on the Application of LLM Explainability to Reasoning and Planning] Latent Chain-of-Thought? Decoding the Depth-…☆18Oct 4, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Recipes to train the self-rewarding reasoning LLMs.☆232Mar 2, 2025Updated last year
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆43Sep 18, 2025Updated 7 months ago
- Data and codes for EMNLP 2022 paper "CDConv: A Benchmark for Contradiction Detection in Chinese Conversations"☆13May 8, 2023Updated 2 years ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆189May 20, 2025Updated 11 months ago
- ☆29Feb 10, 2025Updated last year
- ☆19Mar 25, 2025Updated last year
- ☆970Jan 23, 2025Updated last year
- [ICML 2025 Oral] CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction☆568May 6, 2025Updated 11 months ago
- Official Code For EMNLP2025 Findings: {DLPO : Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Le…☆10Dec 25, 2025Updated 4 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆10Jun 11, 2025Updated 10 months ago
- Official Repository of "Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Ste…☆28Mar 9, 2026Updated last month
- A large-scale dataset composed of high-quality synthetic images aimed at evaluating social biases in LVLMs☆13Apr 7, 2026Updated 3 weeks ago
- Measuring how well CLI agents like Claude Code or Codex CLI can post-train base LLMs on a single H100 GPU in 10 hours☆293Updated this week
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆192Mar 20, 2025Updated last year
- Codebase for Math Neurosurgery: Isolating LLMs' Math Reasoning Abilities Using Only Forward Passes☆21Jun 15, 2025Updated 10 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆33Aug 5, 2025Updated 8 months ago