Wangmerlyn / MCTS-GSM8k-Demo
This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems
☆67Updated last week
Alternatives and similar repositories for MCTS-GSM8k-Demo:
Users that are interested in MCTS-GSM8k-Demo are comparing it to the libraries listed below
- ☆60Updated 4 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆167Updated 2 weeks ago
- ☆101Updated 3 months ago
- The official repository of the Omni-MATH benchmark.☆78Updated 3 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆161Updated 2 weeks ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆78Updated 3 weeks ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆107Updated last week
- A Comprehensive Survey on Long Context Language Modeling☆113Updated last week
- ☆125Updated 3 weeks ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆60Updated 5 months ago
- ☆54Updated 5 months ago
- ☆81Updated 11 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆162Updated 2 weeks ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆121Updated 8 months ago
- ☆29Updated 4 months ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆115Updated 5 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆230Updated last month
- ☆171Updated last month
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆84Updated last month
- On Memorization of Large Language Models in Logical Reasoning☆60Updated this week
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆95Updated 2 months ago
- Reformatted Alignment☆115Updated 6 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆126Updated 3 months ago
- ☆92Updated 3 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆30Updated 10 months ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆103Updated 3 weeks ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆147Updated 6 months ago
- ☆81Updated last year
- ☆47Updated last month
- ☆62Updated this week