UCSB-NLP-Chang / ULDLinks

Implementation of paper 'Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference' [NeurIPS'24]

☆21

Alternatives and similar repositories for ULD

Users that are interested in ULD are comparing it to the libraries listed below

Sorting:

licong-lin / negative-preference-optimization
☆60Updated last year
jthickstun / watermark
Code for watermarking language models
☆80Updated 11 months ago
SafeAILab / RAIN
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆96Updated last year
ejones313 / auditing-llms
☆56Updated 2 years ago
swj0419 / muse_bench
☆23Updated 4 months ago
locuslab / acr-memorization
☆35Updated 7 months ago
yaojin17 / Unlearning_LLM
[ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"
☆59Updated 10 months ago
paul-rottger / xstest
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆106Updated 5 months ago
sail-sg / I-FSJ
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)
☆65Updated 6 months ago
boyiwei / alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
☆82Updated 4 months ago
franciscoliu / Awesome-GenAI-Unlearning
☆156Updated last week
sail-sg / Cheating-LLM-Benchmarks
[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)
☆81Updated 9 months ago
ajyl / dpo_toxic
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆74Updated 5 months ago
centerforaisafety / tdc2023-starter-kit
This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.
☆90Updated last year
Jayfeather1024 / Backdoor-Enhanced-Alignment
☆22Updated 8 months ago
Unispac / shallow-vs-deep-alignment
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆143Updated 3 months ago
zjysteven / mink-plus-plus
[ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs
☆41Updated 2 months ago
ShuheSH / A-Survey-of-the-Reasoning-Abilities-of-LLMs
☆24Updated 5 months ago
Vaidehi99 / InfoDeletionAttacks
☆44Updated 6 months ago
weichen-yu / LM-Extraction
☆44Updated 2 years ago
ykwon0407 / DataInf
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)
☆73Updated 10 months ago
decoding-comp-trust / comp-trust
Codebase for decoding compressed trust.
☆24Updated last year
UCSC-VLAA / vllm-safety-benchmark
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
☆81Updated last year
OPTML-Group / SOUL
Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"
☆26Updated 10 months ago
princeton-nlp / benign-data-breaks-safety
☆41Updated 10 months ago
pratyushmaini / llm_dataset_inference
Official Repository for Dataset Inference for LLMs
☆36Updated last year
yihuaihong / ConceptVectors
ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"
☆36Updated 5 months ago
facebookresearch / three_bricks
Official Implementation of the paper "Three Bricks to Consolidate Watermarks for LLMs"
☆48Updated last year
poloclub / llm-landscape
NeurIPS'24 - LLM Safety Landscape
☆25Updated 5 months ago
SALT-NLP / Efficient_Unlearning
☆38Updated last year