DistRL-lab / distrl-openLinks
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
☆24Updated 3 months ago
Alternatives and similar repositories for distrl-open
Users that are interested in distrl-open are comparing it to the libraries listed below
Sorting:
- ☆17Updated last month
- SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation☆37Updated 3 weeks ago
- Improving Math reasoning through Direct Preference Optimization with Verifiable Pairs☆13Updated 3 months ago
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents☆34Updated last year
- Code release for "Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search" published at NeurIPS '24.☆11Updated 4 months ago
- Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"☆37Updated 4 months ago
- An index of algorithms for reinforcement learning from human feedback (rlhf))☆92Updated last year
- A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Models☆42Updated 2 months ago
- ☆61Updated 3 months ago
- Official codebase for CuGRO: Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay☆30Updated last year
- Reinforced Multi-LLM Agents training☆17Updated 2 weeks ago
- Official implementation of the NeurIPS 2024 paper CORY☆16Updated 3 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆125Updated last week
- ICLR 2025 Agent-Related Papers☆70Updated 7 months ago
- Direct preference optimization with f-divergences.☆13Updated 7 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆179Updated 2 months ago
- Implementation of TWOSOME☆76Updated 5 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆191Updated last week
- Rewarded soups official implementation☆58Updated last year
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆80Updated 10 months ago
- MARFT stands for Multi-Agent Reinforcement Fine-Tuning. This repository implements an LLM-based multi-agent reinforcement fine-tuning fra…☆43Updated last week
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆37Updated 2 weeks ago
- Official code for "Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning".☆48Updated last year
- [NeurIPS 2024] Official Implementation of Meta-DT☆44Updated 8 months ago
- Repository for the paper "InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners"☆45Updated last month
- A comprehensive collection of process reward models.☆92Updated 2 weeks ago
- ☆136Updated 6 months ago
- Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"☆72Updated 2 weeks ago
- Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"☆113Updated last month
- ☆220Updated last month