shunzh / mcts-for-llm
This is a pip package implementing Reinforcement Learning algorithms in non-stationary environments supported by the OpenAI Gym toolkit.
☆11Updated 6 months ago
Alternatives and similar repositories for mcts-for-llm:
Users that are interested in mcts-for-llm are comparing it to the libraries listed below
- ☆20Updated 3 months ago
- [EMNLP 2023, Findings] GRACE: Discriminator-Guided Chain-of-Thought Reasoning☆46Updated 3 months ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆98Updated last year
- ☆111Updated 6 months ago
- CodeUltraFeedback: aligning large language models to coding preferences☆66Updated 6 months ago
- This is official project in our paper: Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers☆28Updated last year
- ☆53Updated 3 months ago
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆90Updated last month
- ☆27Updated 2 weeks ago
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆49Updated 7 months ago
- Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs☆42Updated 6 months ago
- NeurIPS 2024 tutorial on LLM Inference☆37Updated last month
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆57Updated this week
- Repo of paper "Free Process Rewards without Process Labels"☆100Updated this week
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆55Updated 8 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆53Updated 10 months ago
- ☆76Updated 6 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆66Updated 2 weeks ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆112Updated 4 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆130Updated 4 months ago
- ☆20Updated 7 months ago
- ☆93Updated 6 months ago
- ☆38Updated 3 months ago
- [ACL 2024] The project of Symbol-LLM☆46Updated 6 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆132Updated 3 months ago
- A repository for transformer critique learning and generation☆88Updated last year
- ☆40Updated 8 months ago
- Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"☆32Updated 3 months ago
- Self-Alignment with Principle-Following Reward Models☆151Updated 10 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆111Updated 2 months ago