Online Preference Alignment for Language Models via Count-based Exploration
☆17Jan 14, 2025Updated last year
Alternatives and similar repositories for COPO
Users that are interested in COPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Accompanying Code for "Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning", ICML 2023☆22Dec 29, 2023Updated 2 years ago
- [NeurIPS' 24] The PyTorch implementation of our paper: "Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learnin…☆21Oct 10, 2024Updated last year
- Code space for L4DC paper "State-wise Safe Reinforcement Learning With Pixel Observations"☆11Apr 5, 2024Updated last year
- [IROS2024] STAIR: Semantic-Targeted Active Implicit Reconstruction☆17Aug 3, 2024Updated last year
- Official PyTorch Implementation of Paper -- "MoRE: Mixture of Residual Experts for Humanoid Lifelike Gaits Learning on Complex Terrains"☆230Nov 11, 2025Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Feasibility Consistent Representation Learning for Safe Reinforcement Learning (ICML 2024). Current SOTA model-free safe RL algorithm on …☆14Jul 12, 2024Updated last year
- Blog post: how to do deterministic policy gradient with gumbel softmax and why you should do it.☆12Jun 20, 2017Updated 8 years ago
- Official Repo for SvS: A Self-play with Variational Problem Synthesis strategy for RLVR training☆54Dec 13, 2025Updated 3 months ago
- ☆13May 13, 2025Updated 10 months ago
- Official implementation of paper: LiNo: Advancing Recursive Residual Decomposition of Linear and Nonlinear Patterns for Robust Time Serie…☆18Dec 19, 2025Updated 3 months ago
- ☆16Jun 12, 2024Updated last year
- ☆37May 19, 2025Updated 10 months ago
- ☆13Jun 4, 2025Updated 9 months ago
- The repository is for Reinforcement-Learning Uncertainty research, in which we investigate various uncertain factors in RL.☆23Jun 16, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Pessimistic Value Iteration for Multi-Task Data Sharing in Offline RL☆18Nov 21, 2023Updated 2 years ago
- This is the source code of FUSION, a safety-aware causal representation for generalizable driving agents.☆26Oct 23, 2024Updated last year
- ☆12Nov 10, 2020Updated 5 years ago
- The official implementation of "Transformer in Transformer as Backbone for Deep Reinforcement Learning"☆56Dec 27, 2023Updated 2 years ago
- Code accompanying the paper "Off-Policy Primal-Dual Safe Reinforcement Learning"☆21Mar 29, 2024Updated last year
- ICML 2024 - Self-Driven Entropy Aggregation for Byzantine-Robust Heterogeneous Federated Learning☆10Jul 16, 2024Updated last year
- ☆16Nov 1, 2023Updated 2 years ago
- NeurIPS 2024: Bidirectional Recurrence for Cardiac Motion Tracking with Gaussian Process Latent Coding☆16Jun 20, 2025Updated 9 months ago
- Implementation of "Towards Understanding Mixture of Experts in Deep Learning", NeurIPS 2022☆10Jan 6, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [CVPR 2025] The official implementation of "CacheQuant: Comprehensively Accelerated Diffusion Models"☆47Nov 2, 2025Updated 4 months ago
- Code for paper Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety.☆20May 22, 2022Updated 3 years ago
- A PyTorch implementation of [VCT](https://github.com/google-research/google-research/tree/master/vct)☆10Nov 25, 2022Updated 3 years ago
- Source code of the paper titled "Digital Semantic Communications: An Alternating Multi-Phase Training Strategy with Mask Attack"☆13Oct 5, 2025Updated 5 months ago
- The codes are for the paper: ``Complete Dictionary Learning via \ell_p-norm Maximization'',Yifei Shen∗ , Ye Xue∗ , Jun Zhang , Khaled B. …☆11Nov 21, 2020Updated 5 years ago
- ☆33Jul 15, 2025Updated 8 months ago
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- Easy to install Text to Speech system for Raspberry Pi 4☆15Mar 4, 2024Updated 2 years ago
- Code for paper "Efficient Sparse Coding using Hierarchical Riemannian Pursuit," in IEEE Transactions on Signal Processing, Y. Xue, V. K. …☆13Jul 20, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆103Jul 18, 2025Updated 8 months ago
- J-BHI 2024: Exploiting Hierarchical Interactions for Protein Surface Learning☆17Jan 21, 2024Updated 2 years ago
- Benchmarking Deepseek R1 API response speeds across different providers for performance comparison.☆10Feb 15, 2025Updated last year
- [IEEE SENSORS 2025/26] PicoSAM2 and PicoSAM3 are in-sensor segmentation models compatible with the Sony IMX500☆28Mar 13, 2026Updated 2 weeks ago
- Please visit our demonstration website for interactive demonstrations☆33Oct 1, 2024Updated last year
- Decoupled Q-Chunking☆59Jan 10, 2026Updated 2 months ago
- TextOp: Real-time Interactive Text-Driven Humanoid Robot Motion Generation and Control☆340Feb 7, 2026Updated last month