Baichenjia / COPO
Online Preference Alignment for Language Models via Count-based Exploration
☆13Updated 2 months ago
Alternatives and similar repositories for COPO:
Users that are interested in COPO are comparing it to the libraries listed below
- ☆35Updated last month
- Uni-RLHF platform for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback" (ICLR2024…☆33Updated 4 months ago
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting"☆37Updated last month
- Offline RLHF codebase implementation for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human …☆35Updated last year
- [ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusion☆12Updated 2 weeks ago
- The official implementation of the paper "Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction".☆34Updated last year
- Official repository of S-Agents: Self-organizing Agents in Open-ended Environment☆21Updated last year
- Dateset Reset Policy Optimization☆30Updated 11 months ago
- Official implementation for "ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"☆61Updated last month
- A vast array of Multi-Modal Embodied Robotic Foundation Models!☆27Updated last year
- Implementation of the new SOTA for model based RL, from the paper "Improving Transformer World Models for Data-Efficient RL", in Pytorch☆99Updated this week
- ☆87Updated last month
- Natural Language Reinforcement Learning☆84Updated 3 months ago
- NeurIPS 2024 tutorial on LLM Inference☆39Updated 3 months ago
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)☆26Updated 3 months ago
- ☆54Updated this week
- Enabling Mixed Opponent Strategy Script and Self-play on SMAC☆25Updated 2 months ago
- ☆26Updated 11 months ago
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated 6 months ago
- Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as embodied agents.☆96Updated 2 weeks ago
- ☆55Updated 3 weeks ago
- Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning☆32Updated last month
- The official implementation of Self-Exploring Language Models (SELM)☆62Updated 10 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆76Updated 3 weeks ago
- ☆31Updated 2 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆70Updated 3 weeks ago
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆16Updated last year
- ☆44Updated last year
- Minimal RLHF implementation built on top of minGPT.☆29Updated 9 months ago
- The original Shared Recurrent Memory Transformer implementation☆23Updated 2 months ago