SII-MARFT / MARFTLinks
☆15Updated 4 months ago
Alternatives and similar repositories for MARFT
Users that are interested in MARFT are comparing it to the libraries listed below
Sorting:
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆198Updated 7 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆387Updated 4 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆141Updated last month
- Repo for Anonymous purpose, pls don't distribute☆10Updated last year
- [ICML 2025] "From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium"☆29Updated last week
- Benchmarking LLMs' Gaming Ability in Multi-Agent Environments☆88Updated 6 months ago
- ☆216Updated 8 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆56Updated last year
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆93Updated last year
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆373Updated last month
- A Framework for LLM-based Multi-Agent Reinforced Training and Inference☆357Updated last week
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆327Updated last year
- This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language …☆143Updated 6 months ago
- ☆46Updated 8 months ago
- ☆316Updated 6 months ago
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆189Updated 10 months ago
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆48Updated 5 months ago
- ☆212Updated 9 months ago
- ☆67Updated 7 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆115Updated 3 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆155Updated last year
- Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"☆44Updated 9 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆132Updated 8 months ago
- Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"☆78Updated 5 months ago
- Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models☆14Updated 2 years ago
- Rewarded soups official implementation☆62Updated 2 years ago
- Reinforced Multi-LLM Agents training☆60Updated 5 months ago
- ☆117Updated 10 months ago
- Code release for "Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search" published at NeurIPS '24.☆17Updated 9 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆148Updated 9 months ago