tmlr-group / G-effectLinks

[ICLR 2025] "Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond"

☆12

Alternatives and similar repositories for G-effect

Users that are interested in G-effect are comparing it to the libraries listed below

Sorting:

tmlr-group / NoisyRationales
[NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"
☆37Updated 3 months ago
shenlei515 / VHL-paddle
translation of VHL repo in paddle
☆25Updated 2 years ago
zzp1012 / Cross-Task-Linearity
[ICML 2024] Code release for "On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm"
☆11Updated 8 months ago
swj0419 / muse_bench
☆28Updated 7 months ago
git-disl / Booster
This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturba…
☆32Updated 7 months ago
git-disl / Safety-Tax
This is the official code for the paper "Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable".
☆26Updated 7 months ago
nik-dim / tall_masks
Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]
☆51Updated last year
David-Li0406 / AI-Supervision-Risk
☆21Updated 7 months ago
uiuctml / Localize-and-Stitch
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
☆30Updated last month
OPTML-Group / DP4TL
[NeurIPS2023] "Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning" by Yihua Zhang*, Yimeng Zhang*,…
☆14Updated 2 years ago
EnnengYang / RepresentationSurgery
Representation Surgery for Multi-Task Model Merging. ICML, 2024.
☆46Updated last year
homles11 / SaLoRA
Code for “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation(ICLR 2025)”
☆21Updated last week
MartinPawelczyk / In-Context-Unlearning
"In-Context Unlearning: Language Models as Few Shot Unlearners". Martin Pawelczyk, Seth Neel* and Himabindu Lakkaraju*; ICML 2024.
☆28Updated 2 years ago
VITA-Group / SEAL
Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free
☆44Updated 6 months ago
tmlr-group / AR-Bench
[ICML 2025] "From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?"
☆45Updated 3 weeks ago
Model-GLUE / Model-GLUE
☆18Updated last year
warriors-30 / SFAT-paddle
☆24Updated 2 years ago
princeton-nlp / benign-data-breaks-safety
☆41Updated last year
NUS-TRAIL / Unnatural_Language
The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'
☆23Updated 5 months ago
tanganke / opcm
official code repo for paper "Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging"
☆20Updated 2 weeks ago
boyiwei / alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
☆85Updated 7 months ago
OPTML-Group / Unlearn-Sparse
[NeurIPS23 (Spotlight)] "Model Sparsity Can Simplify Machine Unlearning" by Jinghan Jia*, Jiancheng Liu*, Parikshit Ram, Yuguang Yao, Gao…
☆81Updated last year
thu-ml / STAIR
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
☆77Updated 8 months ago
git-disl / Vaccine
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆47Updated 11 months ago
uw-nsl / safechain
[ACL 25] SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities
☆20Updated 6 months ago
ykwon0407 / DataInf
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)
☆76Updated last year
IRMBed / IRMBed
This is the project for IRM methods
☆13Updated 4 years ago
EnnengYang / AdaMerging
AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.
☆94Updated last year
IBM / SafeLoRA
Github repo for NeurIPS 2024 paper "Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models"
☆21Updated last month
WanliYoung / Revisit-Editing-Evaluation
Code and data repository for "The Mirage of Model Editing: Revisiting Evaluation in the Wild"
☆15Updated 2 months ago