rmshin / llm-mctsView external linksLinks
☆41Jun 19, 2024Updated last year
Alternatives and similar repositories for llm-mcts
Users that are interested in llm-mcts are comparing it to the libraries listed below
Sorting:
- A Python reimplementation of "Planning with Large Language Models for Code Generation" (https://arxiv.org/abs/2303.05510)☆18Dec 1, 2023Updated 2 years ago
- ☆21Jul 25, 2025Updated 6 months ago
- ☆28May 29, 2024Updated last year
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆14Jun 21, 2024Updated last year
- This is a pip package implementing Reinforcement Learning algorithms in non-stationary environments supported by the OpenAI Gym toolkit.☆16Jun 28, 2024Updated last year
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆329Jan 29, 2026Updated 2 weeks ago
- Using conversational games to evaluate powerful LLMs☆18Sep 3, 2023Updated 2 years ago
- Securade.ai Sentinel - A monitoring and surveillance application that enables visual Q&A and video captioning for existing CCTV cameras.☆26Apr 6, 2025Updated 10 months ago
- ☆20Mar 1, 2023Updated 2 years ago
- ☆158Mar 18, 2023Updated 2 years ago
- Learning adapter weights from task descriptions☆19Nov 12, 2023Updated 2 years ago
- Albert for Conversational Question Answering Challenge☆22Jun 12, 2023Updated 2 years ago
- ☆45Dec 12, 2024Updated last year
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication☆21Mar 21, 2024Updated last year
- Momentum Decoding: Open-ended Text Generation as Graph Exploration☆19Jan 27, 2023Updated 3 years ago
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆56Jul 11, 2025Updated 7 months ago
- ☆56Nov 6, 2024Updated last year
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆23Mar 12, 2024Updated last year
- Codebase for Context-aware Meta-learned Loss Scaling (CaMeLS). https://arxiv.org/abs/2305.15076.☆25Jan 23, 2024Updated 2 years ago
- [NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling bett…☆295Nov 16, 2024Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Aug 30, 2024Updated last year
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆68Apr 26, 2025Updated 9 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆189Mar 7, 2025Updated 11 months ago
- ☆26Nov 12, 2025Updated 3 months ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆690Jan 20, 2025Updated last year
- Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning"☆118Jan 9, 2024Updated 2 years ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Feb 22, 2024Updated last year
- ☆120Aug 28, 2024Updated last year
- Official implementation of the transformer (TF) architecture suggested in a paper entitled "Looped Transformers as Programmable Computers…☆30Apr 8, 2023Updated 2 years ago
- Explore what LLMs are really leanring over SFT☆28Mar 30, 2024Updated last year
- ☆97Jan 27, 2026Updated 2 weeks ago
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆28Dec 19, 2023Updated 2 years ago
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆206Jul 19, 2024Updated last year
- [ICLR 2025] Code for the paper "Implicit Search via Discrete Diffusion: A Study on Chess"☆36Mar 3, 2025Updated 11 months ago
- [ACL 2025, Main Conference, Oral] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process☆30Aug 2, 2024Updated last year