Baichenjia / COPO
Online Preference Alignment for Language Models via Count-based Exploration
☆14Updated 3 months ago
Alternatives and similar repositories for COPO:
Users that are interested in COPO are comparing it to the libraries listed below
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆16Updated last year
- Offline RLHF codebase implementation for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human …☆36Updated last year
- The official implementation of the paper "Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction".☆34Updated last year
- ☆36Updated last month
- Dateset Reset Policy Optimization☆30Updated last year
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated 7 months ago
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆28Updated last year
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)☆26Updated 4 months ago
- Benchmarking Mobile Device Control Agents across Diverse Configurations (ICLR 2024 workshop GenAI4DM spotlight presentation)☆31Updated 4 months ago
- This repository is the official implementation of the TRAC optimizer in Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement …☆25Updated 6 months ago
- this is for fun, ain't it grand!☆16Updated last year
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆29Updated 9 months ago
- ☆27Updated 10 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆23Updated last month
- Natural Language Reinforcement Learning☆87Updated 4 months ago
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR 2025)☆41Updated last week
- Official implementation for "ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"☆64Updated 2 months ago
- Uni-RLHF platform for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback" (ICLR2024…☆36Updated 5 months ago
- ☆16Updated this week
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆20Updated 2 months ago
- LLM Dynamic Planner - Combining LLM with PDDL Planners to solve an embodied task☆42Updated 3 months ago
- ☆57Updated 8 months ago
- ☆27Updated 2 months ago
- [ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusion☆13Updated last month
- A vast array of Multi-Modal Embodied Robotic Foundation Models!☆27Updated last year
- Code for "Interactive Task Planning with Language Models"☆27Updated last year
- A Data Source for Reasoning Embodied Agents☆19Updated last year
- Enabling Mixed Opponent Strategy Script and Self-play on SMAC☆26Updated 3 months ago
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆36Updated 7 months ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Updated 6 months ago