Baichenjia/COPO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Baichenjia/COPO)

Baichenjia / COPO

Online Preference Alignment for Language Models via Count-based Exploration

☆21

Alternatives and similar repositories for COPO

Users that are interested in COPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

breez3young / DIMA
View on GitHub
[NIPS'25] Official Implementation of "Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective" in PyTorch.
☆17Nov 11, 2025Updated 8 months ago
breez3young / TACO
View on GitHub
Official Implementation of "Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach"
☆39Apr 6, 2026Updated 3 months ago
samlobel / CFN
View on GitHub
Accompanying Code for "Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning", ICML 2023
☆25Dec 29, 2023Updated 2 years ago
LXXXXR / Kaleidoscope
View on GitHub
[NeurIPS' 24] The PyTorch implementation of our paper: "Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learnin…
☆21Oct 10, 2024Updated last year
LXXXXR / ICES
View on GitHub
[ICML' 24] The PyTorch implementation of our paper: "Individual Contributions as Intrinsic Exploration Scaffolds for Multi-agent Reinforc…
☆25May 29, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
SimonZhan-code / Step-Wise_SafeRL_Pixel
View on GitHub
Code space for L4DC paper "State-wise Safe Reinforcement Learning With Pixel Observations"
☆11Apr 5, 2024Updated 2 years ago
yandexdataschool / gumbel_dpg
View on GitHub
Blog post: how to do deterministic policy gradient with gumbel softmax and why you should do it.
☆12Jun 20, 2017Updated 9 years ago
ZhaolinGao / REFUEL
View on GitHub
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
☆25Oct 8, 2024Updated last year
Levi-Ackman / LiNo
View on GitHub
Official implementation of paper: LiNo: Advancing Recursive Residual Decomposition of Linear and Nonlinear Patterns for Robust Time Serie…
☆18Dec 19, 2025Updated 7 months ago
yayayacc / TIDE
View on GitHub
☆18Feb 4, 2026Updated 5 months ago
Baichenjia / Pix2Pix-eager
View on GitHub
Tensorflow eager implementation of Pix2Pix (Image-to-image translation with conditional adversarial networks)
☆12Aug 12, 2019Updated 6 years ago
ant-research / M2-Miner
View on GitHub
[ICLR 2026] M2-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining
☆55Apr 22, 2026Updated 3 months ago
papercopilot / iclr-insights
View on GitHub
Insights from the ICLR Peer Review and Rebuttal Process
☆16Nov 24, 2025Updated 8 months ago
HauffQian / DGAP
View on GitHub
☆14May 13, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
LAMDA-RL / ImagineBench
View on GitHub
A benchmark for evaluating reinforcement learning algorithms that train the policies using imaginary rollouts from LLMs.
☆15Nov 4, 2025Updated 8 months ago
xyq7 / Human-Contribution-Measurement
View on GitHub
☆13Jun 4, 2025Updated last year
jqueeney / robust-safe-rl
View on GitHub
Robust and safe deep reinforcement learning algorithms
☆17Mar 27, 2024Updated 2 years ago
letitiabanana / PnP-OVSS
View on GitHub
[CVPR'24] Code for Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
☆18Jul 22, 2024Updated 2 years ago
yihangyao / OASIS
View on GitHub
☆20Nov 3, 2024Updated last year
maoyixiu / DMG
View on GitHub
[NeurIPS 2024] Doubly Mild Generalization for Offline Reinforcement Learning
☆17Oct 29, 2025Updated 9 months ago
HenryLHH / fusion
View on GitHub
This is the source code of FUSION, a safety-aware causal representation for generalizable driving agents.
☆28Oct 23, 2024Updated last year
Paitesanshi / CharacterBox
View on GitHub
☆24Dec 30, 2024Updated last year
maohangyu / TIT_open_source
View on GitHub
The official implementation of "Transformer in Transformer as Backbone for Deep Reinforcement Learning"
☆59Dec 27, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ModalMinds / MM-PRM
View on GitHub
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision
☆30May 26, 2025Updated last year
thu-rllab / LESR
View on GitHub
LLM-Empowered State Representation for Reinforcement Learning (ICML2024 Accepted paper)
☆42Jun 14, 2024Updated 2 years ago
yassineCh / CAPS
View on GitHub
Code for Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning, AAAI 2025
☆15Dec 19, 2024Updated last year
PKU-ICST-MIPL / Finedefics_ICLR2025
View on GitHub
☆94Mar 20, 2026Updated 4 months ago
xiaoyan07 / SAM_MLoRA
View on GitHub
☆23May 28, 2025Updated last year
czp16 / FCSRL
View on GitHub
Feasibility Consistent Representation Learning for Safe Reinforcement Learning (ICML 2024). Current SOTA model-free safe RL algorithm on …
☆17Jul 12, 2024Updated 2 years ago
ZifanWu / CAL
View on GitHub
Code accompanying the paper "Off-Policy Primal-Dual Safe Reinforcement Learning"
☆22Mar 29, 2024Updated 2 years ago
vacancy / PDSketch-Alpha-Release
View on GitHub
☆17Nov 1, 2023Updated 2 years ago
xmed-lab / DiffCMR
View on GitHub
☆14Dec 11, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
TeleHuman / MoRE
View on GitHub
Official PyTorch Implementation of Paper -- "MoRE: Mixture of Residual Experts for Humanoid Lifelike Gaits Learning on Complex Terrains"
☆291Nov 11, 2025Updated 8 months ago
lyqun / Task-Aware_Sampling
View on GitHub
TVCG 2022: Task-Aware Sampling Layer for Point-Wise Analysis
☆16Jan 21, 2024Updated 2 years ago
yokoxue / LpDL
View on GitHub
The codes are for the paper: ``Complete Dictionary Learning via \ell_p-norm Maximization'',Yifei Shen∗ , Ye Xue∗ , Jun Zhang , Khaled B. …
☆11Nov 21, 2020Updated 5 years ago
NVlabs / NFT
View on GitHub
Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasonin…
☆88Sep 8, 2025Updated 10 months ago
mahaitongdae / Feasible-Actor-Critic
View on GitHub
Code for paper Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety.
☆20May 22, 2022Updated 4 years ago
allenai / sso
View on GitHub
Repository for Skill Set Optimization
☆14Jul 26, 2024Updated 2 years ago
yunhuijang / GEEL
View on GitHub
A Simple and Scalable Representation for Graph Generation (ICLR 2024)
☆21Mar 19, 2024Updated 2 years ago