DistRL-lab / distrl-openLinks
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
☆23Updated 3 months ago
Alternatives and similar repositories for distrl-open
Users that are interested in distrl-open are comparing it to the libraries listed below
Sorting:
- ☆14Updated 2 weeks ago
- SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation☆36Updated last month
- Improving Math reasoning through Direct Preference Optimization with Verifiable Pairs☆13Updated 2 months ago
- ☆59Updated 3 months ago
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents☆34Updated last year
- Code for our paper LLaMAR: LM-based Long-Horizon Planner for Multi-Agent Robotics☆13Updated 3 months ago
- Official Code For: {DLPO : Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective}☆9Updated last month
- Implementation of TWOSOME☆73Updated 4 months ago
- Direct preference optimization with f-divergences.☆13Updated 7 months ago
- ☆30Updated last year
- Official implementation of the NeurIPS 2024 paper CORY☆13Updated 3 months ago
- Code release for "Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search" published at NeurIPS '24.☆11Updated 3 months ago
- Rewarded soups official implementation☆58Updated last year
- ☆78Updated last year
- An index of algorithms for reinforcement learning from human feedback (rlhf))☆92Updated last year
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆79Updated 9 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆123Updated this week
- Official code for "Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning".☆47Updated last year
- 📖 Full Stack Practice of the Large Language Model Training @ RLChina 2024☆39Updated 7 months ago
- Official code for ICML 2024 paper, "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences" (ICML 2024 Spotlight)☆29Updated 7 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆212Updated this week
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆177Updated last month
- ☆11Updated last week
- A comprehensive collection of process reward models.☆88Updated 2 weeks ago
- A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Models☆40Updated 2 months ago
- ☆42Updated 7 months ago
- ☆14Updated 7 months ago
- ☆151Updated this week
- ☆114Updated 4 months ago
- [ICLR 2024] Official Implementation of ACORM☆48Updated last year