zzwjames / FailureLLMUnlearningLinks

An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)

☆29

Alternatives and similar repositories for FailureLLMUnlearning

Users that are interested in FailureLLMUnlearning are comparing it to the libraries listed below

Sorting:

JayZhang42 / SLED
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433
☆28Updated 8 months ago
ZhentingWang / DUMP
☆22Updated 2 months ago
facebookresearch / AbstentionBench
A holistic benchmark for LLM abstention
☆41Updated 2 weeks ago
luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆42Updated 9 months ago
sail-sg / Cheating-LLM-Benchmarks
[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)
☆81Updated 9 months ago
uservan / ThinkPO
☆18Updated this week
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆47Updated 5 months ago
katiekang1998 / reasoning_generalization
☆34Updated 6 months ago
tml-epfl / icl-alignment
Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]
☆31Updated 6 months ago
peterljq / Parsimonious-Concept-Engineering
PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)
☆39Updated 9 months ago
zjunlp / unlearn
[ACL 2025] Knowledge Unlearning for Large Language Models
☆39Updated 3 months ago
stanford-crfm / air-bench-2024
AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies
☆23Updated 11 months ago
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 3 months ago
mandyyyyii / east
☆20Updated 3 months ago
ChnQ / TracingLLM
☆28Updated last year
JasonForJoy / Model-Editing-Hurt
EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
☆35Updated 2 months ago
tianyi-lab / Mosaic-IT
[ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning
☆19Updated last month
UCSB-NLP-Chang / Prereq_tune
Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"
☆10Updated 6 months ago
SafeAILab / RAIN
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆96Updated last year
stanfordnlp / axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆112Updated last month
ShuheSH / A-Survey-of-the-Reasoning-Abilities-of-LLMs
☆24Updated 5 months ago
yihuaihong / ConceptVectors
ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"
☆36Updated 5 months ago
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆52Updated 5 months ago
tatsu-lab / test_set_contamination
☆38Updated last year
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆35Updated 10 months ago
qcznlp / uncertainty_attack
☆20Updated 11 months ago
limenlp / safer-instruct
This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"
☆17Updated last year
LAMDASZ-ML / Self-Backtracking
☆47Updated 5 months ago
jiahai-feng / binding-iclr
☆15Updated last year
zjunlp / PitfallsKnowledgeEditing
[ICLR 2024] Unveiling the Pitfalls of Knowledge Editing for Large Language Models
☆22Updated last year