junkangwu/QAE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/junkangwu/QAE)

junkangwu / QAE

[ICLR 2026] Quantile Advantage Estimation for Entropy-Safe Reasoning

☆29

Alternatives and similar repositories for QAE

Users that are interested in QAE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

oceanoceanna / LLMEraser
View on GitHub
☆15Feb 26, 2025Updated last year
AkaliKong / PaperClaw
View on GitHub
☆22Mar 11, 2026Updated 4 months ago
injadlu / DAMA
View on GitHub
[ICML 2025] Official code of "DAMA: Data- and Model-aware Alignment of Multi-modal LLMs"
☆16May 24, 2025Updated last year
acharkq / Training-Free-Graph-Matching
View on GitHub
Source code of "Training Free Graph Neural Networks for Graph Matching"
☆12Jul 9, 2022Updated 4 years ago
Henrymachiyu / FIPO
View on GitHub
This code implements the algorithm of FIPO, a value-free RL recipe for eliciting deeper reasoning from a clean base model.
☆18Jul 14, 2026Updated last week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
AkaliKong / iLoRA
View on GitHub
☆27Jan 20, 2025Updated last year
junkangwu / beta-DPO
View on GitHub
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
☆51Oct 23, 2024Updated last year
Optimization-AI / DisCO
View on GitHub
NeurIPS 2025: Discriminative Constrained Optimization for Reinforcing Large Reasoning Models
☆53Mar 14, 2026Updated 4 months ago
PKU-YuanGroup / PiCO
View on GitHub
[ICLR'25] PiCO: Peer Review in LLMs based on the Consistency Optimization, https://arxiv.org/pdf/2402.01830
☆36Feb 16, 2025Updated last year
SophieZheng998 / ALI-Agent
View on GitHub
Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"
☆21Jan 31, 2026Updated 5 months ago
xhwang22 / Awesome-Reward-Hacking
View on GitHub
A curated list of papers and resources on Reward Hacking, Emergent Misalignment, and Proxy Exploitation in Large Models
☆41Apr 17, 2026Updated 3 months ago
AlphaLab-USTC / OhMyCode
View on GitHub
Minimal and Customizable CC-Style Coding Agent
☆131Apr 2, 2026Updated 3 months ago
bcdnlp / PRD
View on GitHub
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
☆12Apr 21, 2024Updated 2 years ago
mstaib / mmd-dro-code
View on GitHub
Accompanying code for our NeurIPS 2019 paper
☆11Nov 7, 2019Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wizard-III / Archer2.0
View on GitHub
Archer2.0 evolves from its predecessor by introducing ASPO, which overcomes fundamental PPO-Clip limitations to prevent premature converg…
☆31Oct 10, 2025Updated 9 months ago
YangZhengyi98 / DROS
View on GitHub
☆41Nov 20, 2023Updated 2 years ago
NLie2 / what_features_jailbreak_LLMs
View on GitHub
☆18Mar 30, 2025Updated last year
YangZhengyi98 / RecInterpreter
View on GitHub
☆25Nov 16, 2023Updated 2 years ago
kyungmnlee / RenyiCL
View on GitHub
Contrastive self-supervised learning using Rényi divergence
☆14Oct 21, 2022Updated 3 years ago
Linzwcs / AFT
View on GitHub
☆13Jan 22, 2025Updated last year
syr-cn / ReMemR1
View on GitHub
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
☆42Apr 13, 2026Updated 3 months ago
RamyaLab / pluralistic-alignment
View on GitHub
The open-source repository for PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment, which provides a general per…
☆17Aug 28, 2025Updated 10 months ago
AlphaLab-USTC / AutoWiki-skill
View on GitHub
☆61Apr 9, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wizard-III / ArcherCodeR
View on GitHub
ArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement …
☆44Aug 6, 2025Updated 11 months ago
syr-cn / SimSGT
View on GitHub
[NeurIPS 2023] "Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules"
☆41Mar 16, 2024Updated 2 years ago
luisfelipewb / RL4WasteCapture
View on GitHub
A Deep Reinforcement Learning Strategy and Framework for Floating Waste Capture
☆13Mar 13, 2025Updated last year
AlphaLab-USTC / Must-Read-LLM-Papers
View on GitHub
☆19Sep 16, 2025Updated 10 months ago
junkangwu / Adap_tau
View on GitHub
[WWW 2023] Official code of "Adap-$\tau$: Adaptively Modulating Embedding Magnitude for Recommendation"
☆29Jan 4, 2024Updated 2 years ago
EnricoCancelli / ProximitySocialNav
View on GitHub
repository for "Exploiting Proximity-Aware Tasks for Embodied Social Navigation" paper code
☆12Nov 16, 2023Updated 2 years ago
xiaomi-mlab / SurroundSDF
View on GitHub
☆10Apr 8, 2024Updated 2 years ago
caokai1073 / UnionCom
View on GitHub
The Software of UnionCom Algorithm
☆26Jul 29, 2024Updated last year
luka-group / CoIN
View on GitHub
☆14Jun 11, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
callsys / GMPO
View on GitHub
[ICLR 2026] Geometric-Mean Policy Optimization
☆104Jan 26, 2026Updated 5 months ago
THUKElab / CLEME
View on GitHub
The repository of CLEME (EMNLP 2023) and CLEME2.0 (ACL 2025)
☆12May 17, 2025Updated last year
aladinD / SafeMERGE
View on GitHub
Code for SafeMERGE (ICLR 2025).
☆15Apr 1, 2025Updated last year
ArthurLeoM / peft-givens
View on GitHub
source code of (quasi-)Givens Orthogonal Fine Tuning integrated to peft lib
☆16Mar 13, 2025Updated last year
ZhaolinGao / A-PO
View on GitHub
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
☆41May 30, 2025Updated last year
GAIR-NLP / Preference-Dissection
View on GitHub
☆25May 16, 2024Updated 2 years ago
eltociear / MolCA
View on GitHub
Code for EMNLP2023 paper "MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter".
☆12Dec 27, 2023Updated 2 years ago