Interpretable Contrastive Monte Carlo Tree Search Reasoning
☆52Nov 9, 2024Updated last year
Alternatives and similar repositories for SC-MCTS
Users that are interested in SC-MCTS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆126May 6, 2025Updated last year
- LCA-on-the-line (ICML 2024 Oral)☆14Feb 13, 2025Updated last year
- ☆71Jun 18, 2025Updated 11 months ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆704Jan 20, 2025Updated last year
- [NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"☆30Mar 30, 2026Updated last month
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆17Oct 31, 2023Updated 2 years ago
- ☆341Jun 5, 2025Updated 11 months ago
- Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence☆64Nov 11, 2025Updated 6 months ago
- Use the tokenizer in parallel to achieve superior acceleration☆20Mar 21, 2024Updated 2 years ago
- ☆35Jun 5, 2025Updated 11 months ago
- Reproducing R1 for Code with Reliable Rewards☆308May 5, 2025Updated last year
- Large Reasoning Models☆804Dec 3, 2024Updated last year
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆55Nov 29, 2024Updated last year
- The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”☆17Feb 26, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"☆60Dec 20, 2023Updated 2 years ago
- Code release for "Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search" published at NeurIPS '24.☆18Feb 21, 2025Updated last year
- [ICLR 2026] Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding☆33Jan 27, 2026Updated 3 months ago
- ☆23Jul 5, 2024Updated last year
- Estimating hardware and cloud costs of LLMs and transformer projects☆21Apr 1, 2026Updated last month
- [COLM 2025: 1st Workshop on the Application of LLM Explainability to Reasoning and Planning] Latent Chain-of-Thought? Decoding the Depth-…☆18Oct 4, 2025Updated 7 months ago
- Recipes to train the self-rewarding reasoning LLMs.☆232Mar 2, 2025Updated last year
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆43Sep 18, 2025Updated 8 months ago
- Data and codes for EMNLP 2022 paper "CDConv: A Benchmark for Contradiction Detection in Chinese Conversations"☆13May 8, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆189May 20, 2025Updated last year
- ☆19Mar 25, 2025Updated last year
- ☆30Feb 10, 2025Updated last year
- ☆970Jan 23, 2025Updated last year
- [ICML 2025 Oral] CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction☆568May 6, 2025Updated last year
- Official Code For EMNLP2025 Findings: {DLPO : Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Le…☆10Dec 25, 2025Updated 4 months ago
- ☆10Jun 11, 2025Updated 11 months ago
- Official Repository of "Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Ste…☆28Mar 9, 2026Updated 2 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆191Mar 20, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Codebase for Math Neurosurgery: Isolating LLMs' Math Reasoning Abilities Using Only Forward Passes☆23Jun 15, 2025Updated 11 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆33Aug 5, 2025Updated 9 months ago
- ☆15Aug 3, 2021Updated 4 years ago
- ☆31Mar 23, 2024Updated 2 years ago
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"☆43Oct 1, 2024Updated last year
- ☆12Mar 22, 2025Updated last year
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated 2 years ago