OpenRL-Lab / Ray_Tutorial
Tutorial for Ray
☆13Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for Ray_Tutorial
- AI Alignment: A Comprehensive Survey☆128Updated last year
- Uni-RLHF platform for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback" (ICLR2024…☆30Updated this week
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆74Updated 9 months ago
- ☆130Updated 6 months ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆151Updated 11 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆108Updated 7 months ago
- HFAI deep learning models☆90Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆109Updated 4 months ago
- ☆54Updated last month
- The Code Repo for Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization☆94Updated 2 months ago
- Baseline for NeurIPS_Auto_Bidding_General_Track☆24Updated 3 months ago
- Align Anything: Training All-modality Model with Feedback☆248Updated last week
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆46Updated 2 weeks ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆55Updated 3 months ago
- the baseline for NeurIPS_Auto_Bidding_AIGB_Track☆30Updated 3 months ago
- ☆25Updated 2 months ago
- GAOGAO-Bench-2023 is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.☆18Updated 11 months ago
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents☆32Updated 6 months ago
- ☆24Updated 7 months ago
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning☆208Updated this week
- ☆115Updated 4 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆99Updated 3 weeks ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆99Updated 8 months ago
- Official code for the paper, "Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining"☆13Updated this week
- ☆48Updated last year
- A collection of LLM with RL papers☆230Updated 6 months ago
- ☆61Updated this week
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆123Updated this week
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆126Updated 5 months ago
- ☆79Updated 7 months ago