Online Preference Alignment for Language Models via Count-based Exploration
☆18Jan 14, 2025Updated last year
Alternatives and similar repositories for COPO
Users that are interested in COPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Accompanying Code for "Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning", ICML 2023☆24Dec 29, 2023Updated 2 years ago
- Official Implementation of "Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach"☆37Apr 6, 2026Updated last month
- [NeurIPS' 24] The PyTorch implementation of our paper: "Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learnin…☆21Oct 10, 2024Updated last year
- A benchmark for evaluating reinforcement learning algorithms that train the policies using imaginary rollouts from LLMs.☆14Nov 4, 2025Updated 6 months ago
- [ICML' 24] The PyTorch implementation of our paper: "Individual Contributions as Intrinsic Exploration Scaffolds for Multi-agent Reinforc…☆25May 29, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code space for L4DC paper "State-wise Safe Reinforcement Learning With Pixel Observations"☆11Apr 5, 2024Updated 2 years ago
- [NeurIPS 2025] Official Implementation of "HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning"☆89Nov 6, 2025Updated 6 months ago
- LLM-Empowered State Representation for Reinforcement Learning (ICML2024 Accepted paper)☆39Jun 14, 2024Updated last year
- [IROS2024] STAIR: Semantic-Targeted Active Implicit Reconstruction☆17Aug 3, 2024Updated last year
- Feasibility Consistent Representation Learning for Safe Reinforcement Learning (ICML 2024). Current SOTA model-free safe RL algorithm on …☆16Jul 12, 2024Updated last year
- Official PyTorch Implementation of Paper -- "MoRE: Mixture of Residual Experts for Humanoid Lifelike Gaits Learning on Complex Terrains"☆269Nov 11, 2025Updated 6 months ago
- Blog post: how to do deterministic policy gradient with gumbel softmax and why you should do it.☆12Jun 20, 2017Updated 8 years ago
- ☆13May 13, 2025Updated last year
- Official implementation of paper: LiNo: Advancing Recursive Residual Decomposition of Linear and Nonlinear Patterns for Robust Time Serie…☆18Dec 19, 2025Updated 5 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆16Jun 12, 2024Updated last year
- G-HER algorithm☆18May 24, 2019Updated 7 years ago
- ☆40May 19, 2025Updated last year
- ☆13Jun 4, 2025Updated 11 months ago
- The repository is for Reinforcement-Learning Uncertainty research, in which we investigate various uncertain factors in RL.☆23Jun 16, 2023Updated 2 years ago
- Pessimistic Value Iteration for Multi-Task Data Sharing in Offline RL☆18Nov 21, 2023Updated 2 years ago
- The official implementation of "Transformer in Transformer as Backbone for Deep Reinforcement Learning"☆59Dec 27, 2023Updated 2 years ago
- ICML 2024 - Self-Driven Entropy Aggregation for Byzantine-Robust Heterogeneous Federated Learning☆10Jul 16, 2024Updated last year
- Code accompanying the paper "Off-Policy Primal-Dual Safe Reinforcement Learning"☆22Mar 29, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Implementation of "Towards Understanding Mixture of Experts in Deep Learning", NeurIPS 2022☆10Jan 6, 2023Updated 3 years ago
- [CVPR 2025] The official implementation of "CacheQuant: Comprehensively Accelerated Diffusion Models"☆48Nov 2, 2025Updated 6 months ago
- Code for paper Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety.☆20May 22, 2022Updated 4 years ago
- Official Implementation of "Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance".☆67Oct 16, 2025Updated 7 months ago
- A PyTorch implementation of [VCT](https://github.com/google-research/google-research/tree/master/vct)☆10Nov 25, 2022Updated 3 years ago
- ☆14Dec 11, 2023Updated 2 years ago
- The codes are for the paper: ``Complete Dictionary Learning via \ell_p-norm Maximization'',Yifei Shen∗ , Ye Xue∗ , Jun Zhang , Khaled B. …☆11Nov 21, 2020Updated 5 years ago
- ☆34Jul 15, 2025Updated 10 months ago
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for paper "Efficient Sparse Coding using Hierarchical Riemannian Pursuit," in IEEE Transactions on Signal Processing, Y. Xue, V. K. …☆13Jul 20, 2021Updated 4 years ago
- Benchmarking Deepseek R1 API response speeds across different providers for performance comparison.☆10Feb 15, 2025Updated last year
- A LLM prompt to give some semblance of referential recursive structure☆26Apr 29, 2026Updated last month
- The official code for [ECCV2020] "HALO: Hardware-aware Learning to Optimize"☆10Mar 22, 2023Updated 3 years ago
- Decoupled Q-Chunking☆68May 3, 2026Updated 3 weeks ago
- Code for the paper "Continual Model-Based Reinforcement Learning with Hypernetworks"☆15Jul 28, 2021Updated 4 years ago
- CycleQD is a framework for parameter space model merging.☆49Feb 1, 2025Updated last year