thu-ml/Noise-Contrastive-Alignment

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/thu-ml/Noise-Contrastive-Alignment)

thu-ml / Noise-Contrastive-Alignment

Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)

☆59

Alternatives and similar repositories for Noise-Contrastive-Alignment

Users that are interested in Noise-Contrastive-Alignment are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thu-ml / Efficient-Diffusion-Alignment
View on GitHub
Official Codebase for "Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control" (NeurIPS 2024)
☆15Oct 29, 2024Updated last year
thu-ml / CCA
View on GitHub
Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"
☆37Feb 11, 2025Updated last year
alecwangcq / f-divergence-dpo
View on GitHub
Direct preference optimization with f-divergences.
☆17Nov 3, 2024Updated last year
junkangwu / alpha-DPO
View on GitHub
[ICML 2025] Official code of "AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization"
☆31Jan 10, 2026Updated 6 months ago
sauc-abadal / ALT
View on GitHub
Official repository for ALT (ALignment with Textual feedback).
☆10Jul 25, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ZhaolinGao / A-PO
View on GitHub
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
☆41May 30, 2025Updated last year
yale-nlp / refdpo
View on GitHub
☆16Jul 23, 2024Updated 2 years ago
SALT-NLP / demonstrated-feedback
View on GitHub
☆131Oct 1, 2024Updated last year
Reason-Wang / NAT
View on GitHub
[NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…
☆28Mar 14, 2024Updated 2 years ago
maxreciprocate / offline
View on GitHub
Offline RL experiments
☆15Oct 1, 2022Updated 3 years ago
sail-sg / understand-r1-zero
View on GitHub
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,268Aug 27, 2025Updated 10 months ago
Vance0124 / Token-level-Direct-Preference-Optimization
View on GitHub
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆156Feb 14, 2025Updated last year
YiqinYang / VEM
View on GitHub
Codes accompanying the paper "Offline Reinforcement Learning with Value-Based Episodic Memory" (ICLR 2022 https://arxiv.org/abs/2110.0979…
☆15Mar 9, 2022Updated 4 years ago
AIGeeksGroup / PresentAgent-2
View on GitHub
PresentAgent-2: Towards Generalist Multimodal Presentation Agents
☆17Jun 5, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Infini-AI-Lab / M2PO
View on GitHub
☆34Oct 8, 2025Updated 9 months ago
kkk-an / UltraIF
View on GitHub
Code of EMNLP 2025 paper 'UltraIF: Advancing Instruction Following from the Wild'.
☆21Apr 3, 2025Updated last year
chujiezheng / LLM-Extrapolation
View on GitHub
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75May 20, 2025Updated last year
TianHongZXY / RLVR-Decomposed
View on GitHub
[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
☆166Mar 2, 2026Updated 4 months ago
InternLM / OREAL
View on GitHub
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
☆190Mar 20, 2025Updated last year
WujiangXu / EPO
View on GitHub
The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"
☆40Jul 13, 2026Updated last week
princeton-nlp / SimPO
View on GitHub
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
☆956Feb 16, 2025Updated last year
NineAbyss / S2R
View on GitHub
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
☆77Apr 22, 2025Updated last year
cx0 / llm-typos
View on GitHub
Impact of typos and common misspellings on LLM task performance.
☆25Mar 22, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
jun297 / v1
View on GitHub
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
☆21Updated this week
qinyiwei / InfoBench
View on GitHub
☆61Aug 22, 2024Updated last year
tml1026 / RoleCraft
View on GitHub
☆21Feb 15, 2024Updated 2 years ago
LiuAmber / RAHF
View on GitHub
[ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…
☆28Sep 25, 2024Updated last year
sergiotasconmorales / consistency_vqa
View on GitHub
Repository of paper Consistency-preserving Visual Question Answering in Medical Imaging (MICCAI2022)
☆26Mar 28, 2023Updated 3 years ago
MattYoon / reasoning-models-confidence
View on GitHub
[NeurIPS 2025] Reasoning Models Better Express Their Confidence"
☆23Nov 19, 2025Updated 8 months ago
Stilwell-Git / Adaptation-with-Noisy-OracLE
View on GitHub
PyTorch implementation for our paper "Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation"
☆14Apr 19, 2023Updated 3 years ago
rycolab / odpo
View on GitHub
This repository contains code for the paper Direct Preference Optimization with an Offset (ODPO).
☆21Feb 17, 2025Updated last year
ContextualAI / HALOs
View on GitHub
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
☆908Sep 30, 2025Updated 9 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
thu-coai / SPaR
View on GitHub
☆47Jun 11, 2025Updated last year
facebookresearch / gen_dgrl
View on GitHub
Official codebase for "The Generalization Gap in Offline Reinforcement Learning" accepted to ICLR 2024
☆29Apr 8, 2026Updated 3 months ago
sunblaze-ucb / reasoning_ladder
View on GitHub
☆35May 16, 2025Updated last year
seanzhang-zhichen / Qwen-WisdomVast
View on GitHub
Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …
☆17Apr 12, 2024Updated 2 years ago
meowpass / FollowComplexInstruction
View on GitHub
Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…
☆55Jun 24, 2024Updated 2 years ago
zhongwanjun / ProQA
View on GitHub
The code for paper "ProQA: Structural Prompt-based Pre-training for Unified Question Answering"
☆11Feb 7, 2023Updated 3 years ago
Joluck / MiSS
View on GitHub
MiSS is a novel PEFT method that features a low-rank structure but introduces a new update mechanism distinct from LoRA, achieving an exc…
☆35Mar 9, 2026Updated 4 months ago