shunzh / mcts-for-llmLinks
This is a pip package implementing Reinforcement Learning algorithms in non-stationary environments supported by the OpenAI Gym toolkit.
☆16Updated last year
Alternatives and similar repositories for mcts-for-llm
Users that are interested in mcts-for-llm are comparing it to the libraries listed below
Sorting:
- [EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoning☆49Updated 11 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"☆58Updated last year
- Critique-out-Loud Reward Models☆70Updated 11 months ago
- [COLM 2025] EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees☆24Updated 2 months ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆108Updated last year
- augmented LLM with self reflection☆132Updated last year
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 10 months ago
- [ACL 2024] The project of Symbol-LLM☆57Updated last year
- e☆39Updated 4 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆106Updated last month
- Official code and data repository of MathChat: MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Inte…☆22Updated last year
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆33Updated last year
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆140Updated last year
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆61Updated 9 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆111Updated 4 months ago
- ☆41Updated last year
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆37Updated 8 months ago
- Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models☆108Updated last month
- Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge☆78Updated 2 months ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆81Updated last year
- The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)☆34Updated last year
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆76Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆84Updated 3 months ago
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆49Updated last month
- ☆50Updated last year
- [COLM'24] Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration☆29Updated 11 months ago
- Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"☆75Updated 4 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆123Updated last year
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆246Updated last month