Harry67Hu / CORYLinks
Official implementation of the NeurIPS 2024 paper CORY
☆16Updated 3 months ago
Alternatives and similar repositories for CORY
Users that are interested in CORY are comparing it to the libraries listed below
Sorting:
- DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents☆24Updated 4 months ago
- ☆19Updated 2 weeks ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆81Updated 10 months ago
- Improving Math reasoning through Direct Preference Optimization with Verifiable Pairs☆13Updated 3 months ago
- Code release for "Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search" published at NeurIPS '24.☆11Updated 4 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆125Updated last week
- This project provides a set of translators to convert OpenAI Gym environments into text-based environments. It is designed to investigate…☆18Updated last year
- [arXiv] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆32Updated last month
- Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"☆72Updated 2 weeks ago
- An index of algorithms for reinforcement learning from human feedback (rlhf))☆92Updated last year
- SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation☆38Updated 3 weeks ago
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents☆34Updated last year
- [ICML 2025 Oral] The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchma…☆58Updated last week
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆185Updated last year
- MARFT stands for Multi-Agent Reinforcement Fine-Tuning. This repository implements an LLM-based multi-agent reinforcement fine-tuning fra…☆43Updated last week
- Segment Policy Optimization: Improved Credit Assignment in Reinforcement Learning for LLMs☆16Updated this week
- Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"☆37Updated 4 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆191Updated last week
- ☆17Updated last month
- Rewarded soups official implementation☆58Updated last year
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆240Updated 3 weeks ago
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆130Updated 2 months ago
- A Self-Training Framework for Vision-Language Reasoning☆80Updated 5 months ago
- ☆114Updated 5 months ago
- SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enh…☆31Updated 10 months ago
- ICML'2024: Q-value Regularized Transformer for Offline Reinforcement Learning☆30Updated 5 months ago
- ☆220Updated last month
- ☆14Updated 8 months ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆125Updated last month
- Implementation of TWOSOME☆76Updated 5 months ago