git-disl/Virus

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/git-disl/Virus)

git-disl / Virus

This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"

☆56

Alternatives and similar repositories for Virus

Users that are interested in Virus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

git-disl / Lisa
View on GitHub
This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)
☆28Sep 10, 2024Updated last year
git-disl / Booster
View on GitHub
This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturba…
☆40Mar 22, 2025Updated last year
git-disl / Safety-Tax
View on GitHub
This is the official code for the paper "Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable".
☆35Mar 11, 2025Updated last year
git-disl / Vaccine
View on GitHub
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆51Jan 15, 2026Updated 6 months ago
LLLeoLi / LARF
View on GitHub
[EMNLP 2025] Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
☆15Jul 22, 2025Updated 11 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
princeton-polaris-lab / Evaluating-Durable-Safeguards
View on GitHub
[ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs
☆13Jun 20, 2025Updated last year
IBM / NeuralFuse
View on GitHub
[NeurIPS'24] "NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes" by Hao-Lun …
☆10Sep 18, 2025Updated 10 months ago
ChanLiang / ORIG
View on GitHub
[ACL 2023 findings] Towards Robust Personalized Dialogue Generation via Order-Insensitive Representation Regularization
☆17Aug 26, 2023Updated 2 years ago
Jayfeather1024 / Backdoor-Enhanced-Alignment
View on GitHub
☆24Dec 8, 2024Updated last year
git-disl / EENet
View on GitHub
Code for Adaptive Deep Neural Network Inference Optimization with EENet
☆13Mar 28, 2024Updated 2 years ago
EternityYW / Gemini-Commonsense-Evaluation
View on GitHub
Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"
☆38Jan 3, 2024Updated 2 years ago
hetailang / SqueezeAttention
View on GitHub
☆37Oct 10, 2024Updated last year
aladinD / SafeMERGE
View on GitHub
Code for SafeMERGE (ICLR 2025).
☆15Apr 1, 2025Updated last year
DYR1 / MoGU
View on GitHub
Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.
☆18Jan 14, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
sail-sg / Rigging-ChatbotArena
View on GitHub
Improving Your Model Ranking on Chatbot Arena by Vote Rigging (ICML 2025)
☆27Feb 25, 2025Updated last year
avalonstrel / Mitigating-the-Alignment-Tax-of-RLHF
View on GitHub
☆16Feb 8, 2024Updated 2 years ago
princeton-nlp / benign-data-breaks-safety
View on GitHub
☆47Oct 1, 2024Updated last year
PKU-YuanGroup / AsFT
View on GitHub
Code for the paper "AsFT: Anchoring Safety During LLM Fune-Tuning Within Narrow Safety Basin".
☆37Jul 10, 2025Updated last year
Babelscape / LLM-Oasis
View on GitHub
This repository contains the resource introduced in the paper: "Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis"…
☆25Oct 15, 2025Updated 9 months ago
IBM / SafeLoRA
View on GitHub
Github repo for NeurIPS 2024 paper "Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models"
☆29Dec 21, 2025Updated 7 months ago
git-disl / awesome_LLM-harmful-fine-tuning-papers
View on GitHub
A survey on harmful fine-tuning attack for large language model (ACM CSUR)
☆247Jun 22, 2026Updated 3 weeks ago
listen0425 / Safety-Layers
View on GitHub
code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)
☆25Apr 26, 2025Updated last year
justarter / E2URec
View on GitHub
Official Code for paper "Towards Efficient and Effective Unlearning of Large Language Models for Recommendation" (Frontiers of Computer S…
☆38Jul 19, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
homles11 / SaLoRA
View on GitHub
Code for “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation(ICLR 2025)”
☆29Oct 23, 2025Updated 8 months ago
XinyuanLu00 / TART
View on GitHub
This is the repository for NAACL'25 paper "TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning"
☆59May 3, 2025Updated last year
IBM / AutoVP
View on GitHub
[ICLR24] "AutoVP: An Automated Visual Prompting Framework and Benchmark" by Hsi-Ai Tsao*, Lei Hsiung*, Pin-Yu Chen, Sijia Liu, and Tsung-…
☆23Sep 18, 2025Updated 10 months ago
annahedstroem / sanity-checks-revisited
View on GitHub
[NeurIPS XAIA & Springer] Code and notebooks to paper "A Fresh Look at Sanity Checks for Saliency Maps"
☆25Jul 12, 2024Updated 2 years ago
giangdip2410 / HyperRouter
View on GitHub
Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"
☆33Nov 29, 2023Updated 2 years ago
CERT-Lab / lora-sb
View on GitHub
Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning
☆52Oct 17, 2025Updated 9 months ago
Lslland / T-Vaccine
View on GitHub
☆19Jun 21, 2025Updated last year
SophieZheng998 / ALI-Agent
View on GitHub
Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"
☆21Jan 31, 2026Updated 5 months ago
zjunlp / ModelKinship
View on GitHub
Exploring Model Kinship for Merging Large Language Models
☆28Apr 16, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zjunlp / LookAheadTuning
View on GitHub
[WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews
☆17Dec 14, 2025Updated 7 months ago
MiuLab / PairDistill
View on GitHub
Source code of our paper "PairDistill: Pairwise Relevance Distillation for Dense Retrieval", EMNLP 2024 Main.
☆22Nov 28, 2024Updated last year
ethz-spylab / jailbreak-tax
View on GitHub
☆24Feb 17, 2026Updated 5 months ago
CryptoAILab / misalignment
View on GitHub
[NDSS'25] The official implementation of safety misalignment.
☆19Jan 8, 2025Updated last year
maclong01 / DeBiFormer
View on GitHub
[ACCV 2024 ] Official code for "DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention"
☆33Jan 8, 2025Updated last year
ColinLu50 / SafeDelta
View on GitHub
The official code repo for "Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets" in ICML 2025.
☆59Feb 12, 2026Updated 5 months ago
cisco-open / modelsmith
View on GitHub
A toolkit for optimizing machine learning models for practical applications
☆32Mar 6, 2025Updated last year