ZhaolinGao/A-PO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ZhaolinGao/A-PO)

ZhaolinGao / A-PO

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

☆41

Alternatives and similar repositories for A-PO

Users that are interested in A-PO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

danieldritter / OAPL
View on GitHub
☆30Feb 24, 2026Updated 4 months ago
Freder-chen / ReasonGenRM
View on GitHub
A simple implementation of ReasonGenRM.
☆19Apr 21, 2025Updated last year
ZhaolinGao / TD-VAE-CF
View on GitHub
Mitigating the Filter Bubble while Maintaining Relevance: Targeted Diversification with VAE-based Recommender Systems
☆10Mar 15, 2023Updated 3 years ago
ZhaolinGao / REFUEL
View on GitHub
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
☆25Oct 8, 2024Updated last year
hcoxec / soft_h
View on GitHub
soft entropy estimation
☆16May 29, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ZhaolinGao / Reviewer2
View on GitHub
Optimizing Review Generation Through Prompt Generation
☆17Apr 15, 2024Updated 2 years ago
liujch1998 / ppo-mcts
View on GitHub
☆21Nov 13, 2023Updated 2 years ago
Infini-AI-Lab / GRESO
View on GitHub
☆81Jun 8, 2026Updated last month
jinpz / q_sharp
View on GitHub
The official code release for Q#: Provably Optimal Distributional RL for LLM Post-Training
☆20Mar 4, 2025Updated last year
juzhengz / logit-fusion
View on GitHub
Learning from Mixed Rollouts: Logit Fusion as a Bridge Between Imitation and Exploration
☆17Feb 24, 2026Updated 4 months ago
zkshan2002 / RTO
View on GitHub
☆22Jun 4, 2025Updated last year
thu-ml / Noise-Contrastive-Alignment
View on GitHub
Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)
☆59Nov 8, 2024Updated last year
jwkirchenbauer / mtp-lm
View on GitHub
Source code to accompany research paper on training multi token prediction language models using self-distillation.
☆39Feb 21, 2026Updated 5 months ago
597358816 / AEPO
View on GitHub
Arbitrary Entropy Policy Optimization: Entropy Is Controllable in Reinforcement Fine-tuning
☆17Jan 19, 2026Updated 6 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ars22 / scaling-LLM-math-synthetic-data
View on GitHub
Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"
☆32Jun 16, 2024Updated 2 years ago
bbartoldson / TBA
View on GitHub
Official implementation of TBA for async LLM post-training.
☆31Nov 5, 2025Updated 8 months ago
junkangwu / QAE
View on GitHub
[ICLR 2026] Quantile Advantage Estimation for Entropy-Safe Reasoning
☆29Oct 14, 2025Updated 9 months ago
NVlabs / FRAG
View on GitHub
☆15Apr 25, 2025Updated last year
robjsliwa / pyprolog
View on GitHub
Prolog implemented in Python
☆12Sep 6, 2024Updated last year
junkangwu / alpha-DPO
View on GitHub
[ICML 2025] Official code of "AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization"
☆31Jan 10, 2026Updated 6 months ago
tpoisonooo / open-r1
View on GitHub
Fully open reproduction of DeepSeek-R1
☆11Mar 24, 2025Updated last year
songmzhang / DSKDv2
View on GitHub
The official implementation of the paper "A Dual-Space Framework for General Knowledge Distillation of Large Language Models".
☆18Jan 4, 2026Updated 6 months ago
sands321 / znote
View on GitHub
🖖 图谱式笔记系统，旨在提高个人笔记的使用率！
☆11Jan 17, 2021Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
LAMDA-RL / ACT
View on GitHub
Official code for ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning (AAAI'24)
☆17Feb 10, 2024Updated 2 years ago
rycolab / kl-rb
View on GitHub
This repository contains code for the paper "Better Estimation of the KL Divergence Between Language Models"
☆19May 30, 2025Updated last year
kkk55596 / D-Former
View on GitHub
A PyTorch code implemented for D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation.
☆12Oct 26, 2022Updated 3 years ago
Infini-AI-Lab / M2PO
View on GitHub
☆32Oct 8, 2025Updated 9 months ago
thu-ml / Efficient-Diffusion-Alignment
View on GitHub
Official Codebase for "Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control" (NeurIPS 2024)
☆15Oct 29, 2024Updated last year
georgehc / mnar_mc
View on GitHub
☆12Nov 2, 2021Updated 4 years ago
EnricoCancelli / ProximitySocialNav
View on GitHub
repository for "Exploiting Proximity-Aware Tasks for Embodied Social Navigation" paper code
☆12Nov 16, 2023Updated 2 years ago
TAU-VAILab / HaLo-NeRF
View on GitHub
☆16Apr 30, 2024Updated 2 years ago
TIGER-AI-Lab / StructEval
View on GitHub
Evaluating LLMs' abilities to generate structural output [TMLR2025]
☆23Jun 12, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
horizon-llm / OpenKimi
View on GitHub
[ICML2026] Reproduce Kimi K1.5/K2 RL algorithm and rollout system
☆19Apr 9, 2026Updated 3 months ago
BaohaoLiao / frac-cot
View on GitHub
[COLM 2026] An efficient 3D sampling method for long-CoT LLM.
☆16May 25, 2025Updated last year
ChenxinAn-fdu / POLARIS
View on GitHub
Scaling RL on advanced reasoning models
☆691Oct 20, 2025Updated 9 months ago
hamishivi / EasyLM
View on GitHub
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆78Aug 17, 2024Updated last year
complex-reasoning / RPG
View on GitHub
[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)
☆76Jun 29, 2026Updated 3 weeks ago
Fire-friend / dugMatting
View on GitHub
uncertainty-guided matting on ICML2023
☆12Aug 3, 2023Updated 2 years ago
ml-feedback-sys / materials-f23
View on GitHub
☆10Nov 15, 2023Updated 2 years ago