Online Preference Alignment for Language Models via Count-based Exploration
☆20Jan 14, 2025Updated last year
Alternatives and similar repositories for COPO
Users that are interested in COPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Accompanying Code for "Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning", ICML 2023☆25Dec 29, 2023Updated 2 years ago
- Official Implementation of "Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach"☆39Apr 6, 2026Updated 2 months ago
- [NeurIPS' 24] The PyTorch implementation of our paper: "Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learnin…☆21Oct 10, 2024Updated last year
- A benchmark for evaluating reinforcement learning algorithms that train the policies using imaginary rollouts from LLMs.☆14Nov 4, 2025Updated 7 months ago
- [NeurIPS 2025] Official Implementation of "HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning"☆90Nov 6, 2025Updated 7 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- LLM-Empowered State Representation for Reinforcement Learning (ICML2024 Accepted paper)☆39Jun 14, 2024Updated 2 years ago
- Feasibility Consistent Representation Learning for Safe Reinforcement Learning (ICML 2024). Current SOTA model-free safe RL algorithm on …☆16Jul 12, 2024Updated last year
- Official PyTorch Implementation of Paper -- "MoRE: Mixture of Residual Experts for Humanoid Lifelike Gaits Learning on Complex Terrains"☆275Nov 11, 2025Updated 7 months ago
- Blog post: how to do deterministic policy gradient with gumbel softmax and why you should do it.☆12Jun 20, 2017Updated 8 years ago
- Official Repo for SvS: A Self-play with Variational Problem Synthesis strategy for RLVR training☆53Dec 13, 2025Updated 6 months ago
- ☆13May 13, 2025Updated last year
- Official implementation of paper: LiNo: Advancing Recursive Residual Decomposition of Linear and Nonlinear Patterns for Robust Time Serie…☆18Dec 19, 2025Updated 5 months ago
- ☆16Jun 12, 2024Updated 2 years ago
- G-HER algorithm☆18May 24, 2019Updated 7 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆42May 19, 2025Updated last year
- The repository is for Reinforcement-Learning Uncertainty research, in which we investigate various uncertain factors in RL.☆23Jun 16, 2023Updated 3 years ago
- Code accompanying the paper "Off-Policy Primal-Dual Safe Reinforcement Learning"☆22Mar 29, 2024Updated 2 years ago
- Implementation of "Towards Understanding Mixture of Experts in Deep Learning", NeurIPS 2022☆10Jan 6, 2023Updated 3 years ago
- Source code of the paper titled "Digital Semantic Communications: An Alternating Multi-Phase Training Strategy with Mask Attack"☆16Oct 5, 2025Updated 8 months ago
- TVCG 2022: Task-Aware Sampling Layer for Point-Wise Analysis☆16Jan 21, 2024Updated 2 years ago
- ☆34Jul 15, 2025Updated 11 months ago
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- Code for paper "Efficient Sparse Coding using Hierarchical Riemannian Pursuit," in IEEE Transactions on Signal Processing, Y. Xue, V. K. …☆13Jul 20, 2021Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- J-BHI 2024: Exploiting Hierarchical Interactions for Protein Surface Learning☆17Jan 21, 2024Updated 2 years ago
- ☆106Jul 18, 2025Updated 11 months ago
- A LLM prompt to give some semblance of referential recursive structure☆26Apr 29, 2026Updated last month
- The official code for [ECCV2020] "HALO: Hardware-aware Learning to Optimize"☆10Mar 22, 2023Updated 3 years ago
- Bambo is a new proxy framework. Compared with mainstream frameworks, it is more lightweight and flexible and can handle various load task…☆33Feb 10, 2025Updated last year
- Code for the paper "Continual Model-Based Reinforcement Learning with Hypernetworks"☆15Jul 28, 2021Updated 4 years ago
- CycleQD is a framework for parameter space model merging.☆49Feb 1, 2025Updated last year
- Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning☆29Feb 21, 2022Updated 4 years ago
- Implementation of Latent Diffusion Planning (Amber Xie, Oleh Rybkin, Dorsa Sadigh, Chelsea Finn)☆66Jun 29, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- NuART-Py: Python Library of Adaptive Resonance Theory Neural Network☆10Jan 26, 2020Updated 6 years ago
- Official Code for "Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning"☆146May 16, 2025Updated last year
- Part of a research scholarship. I built a basic 2d driving sim with simulated lidar data to train Deep Q Neural Network. So far after abo…☆11Feb 15, 2017Updated 9 years ago
- TextOp: Real-time Interactive Text-Driven Humanoid Robot Motion Generation and Control☆437Feb 7, 2026Updated 4 months ago
- ☆31Feb 27, 2025Updated last year
- ardrone simulation in gazebo(for kinetic and gazebo 7). Now it can run.☆10Oct 27, 2017Updated 8 years ago
- trending repositories and news related to AI☆11Mar 22, 2019Updated 7 years ago