yihedeng9/rlhf-summary-notes

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yihedeng9/rlhf-summary-notes)

yihedeng9 / rlhf-summary-notes

A brief and partial summary of RLHF algorithms.

☆152

Alternatives and similar repositories for rlhf-summary-notes

Users that are interested in rlhf-summary-notes are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

PurCL / ProSec
View on GitHub
Official repo for "ProSec: Fortifying Code LLMs with Proactive Security Alignment"
☆18Feb 26, 2026Updated 4 months ago
dangxingyu / rnn-icrag
View on GitHub
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Apr 17, 2024Updated 2 years ago
reds-lab / BEEAR
View on GitHub
This is the official Gtihub repo for our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Lang…
☆23Jul 3, 2024Updated 2 years ago
liziniu / HyperDQN
View on GitHub
Code for ICLR 2022 Paper (HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning)
☆12Nov 28, 2023Updated 2 years ago
princeton-polaris-lab / Evaluating-Durable-Safeguards
View on GitHub
[ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs
☆13Jun 20, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
yihedeng9 / DuoGuard
View on GitHub
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
☆34Feb 26, 2025Updated last year
AngelaZZZ-611 / reasoning_models_probing
View on GitHub
☆21May 14, 2026Updated 2 months ago
tangzhy / RealCritic
View on GitHub
☆15Jan 27, 2025Updated last year
zhqwqwq / Learning-Parity-with-CoT
View on GitHub
[ICLR 2025] This repository contains the code to reproduce the results from our paper From Sparse Dependence to Sparse Attention: Unveili…
☆12Mar 7, 2025Updated last year
WadeYin9712 / Dynosaur
View on GitHub
Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)
☆63Nov 30, 2023Updated 2 years ago
liziniu / policy_optimization
View on GitHub
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
☆29Dec 19, 2023Updated 2 years ago
jhayes14 / black-box-attacks
View on GitHub
Comparison of gradient estimation techniques for black-box adversarial examples
☆11Oct 31, 2018Updated 7 years ago
WadeYin9712 / UI-Simulator
View on GitHub
Code for 🌍 UI-Simulator: LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training
☆21Oct 17, 2025Updated 9 months ago
ntucllab / CLImage_Dataset
View on GitHub
The dataset repo of "CLCIFAR: CIFAR-Derived Benchmark Datasets with Human Annotated Complementary Labels" paper
☆17May 11, 2026Updated 2 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
scaleapi / mrt
View on GitHub
https://scale.com/research/mrt
☆20Mar 16, 2026Updated 4 months ago
GuoTianYu2000 / Active-Dormant-Attention
View on GitHub
codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"
☆11Dec 30, 2024Updated last year
zzp1012 / LLFC
View on GitHub
[NeurIPS 2023] Code release for "Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity"
☆19Oct 19, 2023Updated 2 years ago
xxxiaol / magic-if
View on GitHub
Source code and data for The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code (Findings of ACL 2023…
☆31Jun 4, 2023Updated 3 years ago
EnnengYang / RepresentationSurgery
View on GitHub
Representation Surgery for Multi-Task Model Merging. ICML, 2024.
☆49Oct 10, 2024Updated last year
pipilurj / ROBOT
View on GitHub
☆27Apr 11, 2023Updated 3 years ago
yihedeng9 / STIC
View on GitHub
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆68May 31, 2024Updated 2 years ago
echohive42 / listen-to-deepseek-r1-thoughts
View on GitHub
stream-of-consciousness experience of an AI's thinking process, complete with creative tangents and unexpected connections.
☆14Jan 29, 2025Updated last year
haonan3 / V1
View on GitHub
V1: Toward Multimodal Reasoning by Designing Auxiliary Task
☆36Apr 14, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
WadeYin9712 / GD-VCR
View on GitHub
Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).
☆29Sep 4, 2021Updated 4 years ago
tmlr-group / G-effect
View on GitHub
[ICLR 2025] "Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond"
☆16Feb 27, 2025Updated last year
damanimehul / RLCR
View on GitHub
Official repository for Beyond Binary Rewards: Training LMs to Reason about Their Uncertainty
☆67Aug 20, 2025Updated 11 months ago
launchnlp / LitCab
View on GitHub
☆25Jun 10, 2025Updated last year
ScalingIntelligence / Archon
View on GitHub
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆207Mar 7, 2025Updated last year
epfml / schedules-and-scaling
View on GitHub
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆93Oct 30, 2024Updated last year
rosieyzh / openrlhf-pretrain
View on GitHub
Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"
☆29Oct 14, 2025Updated 9 months ago
RLHFlow / Online-RLHF
View on GitHub
A recipe for online RLHF and online iterative DPO.
☆544Dec 28, 2024Updated last year
liziniu / KnapsackRL
View on GitHub
☆19Oct 30, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
andyjm3 / Awesome-Riemannian-Optimization
View on GitHub
This repo contains papers, books, tutorials and resources on Riemannian optimization.
☆64Mar 18, 2026Updated 4 months ago
SORRY-Bench / sorry-bench
View on GitHub
Benchmark evaluation code for "SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal" (ICLR 2025)
☆83Mar 1, 2025Updated last year
Rafa-zy / QLASS
View on GitHub
☆53Aug 24, 2025Updated 10 months ago
fjzzq2002 / random_transformers
View on GitHub
Official code for "Algorithmic Capabilities of Random Transformers" (NeurIPS 2024)
☆15Sep 28, 2024Updated last year
reds-lab / Meta-Sift
View on GitHub
The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on …
☆20Apr 27, 2023Updated 3 years ago
andyjm3 / SLTrain
View on GitHub
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)
☆39Nov 1, 2024Updated last year
Shentao-YANG / Preference_Grounded_Guidance
View on GitHub
Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).
☆17Jan 8, 2025Updated last year