hanshen95/SEAL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hanshen95/SEAL)

hanshen95 / SEAL

An implementation of SEAL: Safety-Enhanced Aligned LLM fine-tuning via bilevel data selection.

☆24

Alternatives and similar repositories for SEAL

Users that are interested in SEAL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hanshen95 / penalized-bilevel-gradient-descent
View on GitHub
An implementation of the penalty-based bilevel gradient descent (PBGD) algorithm and the iterative differentiation (ITD/RHG) methods.
☆19Feb 13, 2023Updated 3 years ago
SCIR-SC-Qiaoban-Team / FreeEvalLM
View on GitHub
[AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…
☆11Feb 7, 2026Updated 5 months ago
git-disl / Lisa
View on GitHub
This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)
☆29Sep 10, 2024Updated last year
SophieZheng998 / ALI-Agent
View on GitHub
Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"
☆21Jan 31, 2026Updated 5 months ago
LLLeoLi / LARF
View on GitHub
[EMNLP 2025] Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
☆15Jul 22, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
yimutianyang / HGSR
View on GitHub
Implement of our TKDE paper: Hyperbolic Graph Learning for Social Recommendation
☆13Jun 3, 2024Updated 2 years ago
junkangwu / Dr_DPO
View on GitHub
[ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"
☆19Jun 1, 2024Updated 2 years ago
git-disl / Safety-Tax
View on GitHub
This is the official code for the paper "Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable".
☆35Mar 11, 2025Updated last year
tanganke / subspace_fusion
View on GitHub
Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"
☆14Mar 28, 2024Updated 2 years ago
circle-hit / Lens
View on GitHub
Code for our paper titled "Lens: Rethinking Multilingual Enhancement for Large Language Models"
☆12Oct 15, 2024Updated last year
git-disl / Virus
View on GitHub
This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"
☆56Feb 2, 2025Updated last year
git-disl / Booster
View on GitHub
This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturba…
☆41Mar 22, 2025Updated last year
git-disl / Vaccine
View on GitHub
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆51Jan 15, 2026Updated 6 months ago
Unified-Language-Model-Alignment / src
View on GitHub
☆14Oct 7, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
aladinD / SafeMERGE
View on GitHub
Code for SafeMERGE (ICLR 2025).
☆15Apr 1, 2025Updated last year
mominabbass / Sharp-MAML
View on GitHub
Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning
☆34Feb 7, 2023Updated 3 years ago
princeton-nlp / benign-data-breaks-safety
View on GitHub
☆47Oct 1, 2024Updated last year
Egg-Hu / Awesome-Synthetic-Data-Generation
View on GitHub
☆20Jan 7, 2026Updated 6 months ago
yuchenlwu / PersonalizedSafety
View on GitHub
[NeurIPS 2025]: Personalized Safety in LLMs — A Benchmark and a Planning-Based Agent Approach
☆17Oct 30, 2025Updated 8 months ago
git-disl / awesome_LLM-harmful-fine-tuning-papers
View on GitHub
A survey on harmful fine-tuning attack for large language model (ACM CSUR)
☆247Jun 22, 2026Updated last month
listen0425 / Safety-Layers
View on GitHub
code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)
☆25Apr 26, 2025Updated last year
PKU-YuanGroup / AsFT
View on GitHub
Code for the paper "AsFT: Anchoring Safety During LLM Fune-Tuning Within Narrow Safety Basin".
☆37Jul 10, 2025Updated last year
vfleaking / PTST
View on GitHub
Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"
☆22Sep 21, 2025Updated 10 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
yimutianyang / SIGIR23-VGCL
View on GitHub
Tensorflow implementation of our SIGIR 2023 accepted paper "Generative-Contrastive Graph Learning for Recommendation"
☆32Aug 26, 2024Updated last year
rishub-tamirisa / tamper-resistance
View on GitHub
[ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"
☆68Jun 9, 2025Updated last year
Hanpx20 / SafeSwitch
View on GitHub
Official code repository for the paper "Internal Activation as the Polar Star for Steering Unsafe LLM Behavior"
☆15May 31, 2026Updated last month
Unispac / shallow-vs-deep-alignment
View on GitHub
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆190Apr 23, 2025Updated last year
ColinLu50 / SafeDelta
View on GitHub
The official code repo for "Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets" in ICML 2025.
☆59Feb 12, 2026Updated 5 months ago
Jayfeather1024 / Backdoor-Enhanced-Alignment
View on GitHub
☆24Dec 8, 2024Updated last year
SuyeonC / Rad-cGAN
View on GitHub
Rad-cGAN v1.0: Radar-based precipitation nowcasting model with conditional Generative Adversarial Networks for multiple dam domains
☆11Jul 22, 2022Updated 4 years ago
circle-hit / SAPT
View on GitHub
Code for ACL 2024 accepted paper titled "SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language …
☆40Jan 13, 2025Updated last year
DeRafael / CAFE
View on GitHub
☆21Oct 25, 2021Updated 4 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
declare-lab / resta
View on GitHub
Restore safety in fine-tuned language models through task arithmetic
☆33Mar 28, 2024Updated 2 years ago
circle-hit / MuCDN
View on GitHub
Code for COLING 2022 accepted paper titled "MuCDN: Mutual Conversational Detachment Network for Emotion Recognition in Multi-Party Conver…
☆10Jul 21, 2023Updated 3 years ago
noahcao / disentanglement_lib_med
View on GitHub
[NeurIPS 2022] disentanglement evaluation robust to model dimension variance.
☆10Sep 21, 2022Updated 3 years ago
vitorpamplona / splitlearning
View on GitHub
Simple Python Socket-based Split Learning technique using PyTorch
☆14Mar 13, 2020Updated 6 years ago
criticalml-uw / TamperBench
View on GitHub
Toolkit to benchmark the tamper-resistance of LLMs.
☆28May 15, 2026Updated 2 months ago
Egg-Hu / LoRA-Recycle
View on GitHub
[CVPR 2025] LoRA Recycle: Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs
☆14Jun 20, 2025Updated last year
ZIB-IOL / SMS
View on GitHub
Code to reproduce the experiments of the ICLR24-paper: "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging"
☆12Oct 14, 2025Updated 9 months ago