Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.
☆18Jan 14, 2025Updated last year
Alternatives and similar repositories for MoGU
Users that are interested in MoGU are comparing it to the libraries listed below
Sorting:
- ☆25Jan 17, 2025Updated last year
- code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)☆22Apr 26, 2025Updated 10 months ago
- Code for LLM_Catastrophic_Forgetting via SAM.☆11Jun 7, 2024Updated last year
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆26Jun 27, 2024Updated last year
- Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…☆23Jul 26, 2024Updated last year
- The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…☆75Jan 16, 2025Updated last year
- ☆21Jul 26, 2025Updated 7 months ago
- A survey on harmful fine-tuning attack for large language model☆235Feb 25, 2026Updated 3 weeks ago
- ☆21Aug 8, 2025Updated 7 months ago
- ☆44Oct 1, 2024Updated last year
- ☆14Feb 26, 2025Updated last year
- This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"☆54Feb 2, 2025Updated last year
- [NeurIPS 2025@FoRLM] R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search☆17Jan 24, 2026Updated last month
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆345Feb 23, 2024Updated 2 years ago
- Identification of the Adversary from a Single Adversarial Example (ICML 2023)☆10Jul 15, 2024Updated last year
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 9 months ago
- [WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews☆17Dec 14, 2025Updated 3 months ago
- TACL 2025: Investigating Adversarial Trigger Transfer in Large Language Models☆19Aug 17, 2025Updated 7 months ago
- ☆19May 14, 2025Updated 10 months ago
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …☆82Updated this week
- Official code for PLoP☆17Mar 6, 2026Updated 2 weeks ago
- Advanced Machine Learning Fall 2020 Project Repository☆12Dec 12, 2020Updated 5 years ago
- [NeurIPS'24] "NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes"☆10Sep 18, 2025Updated 6 months ago
- A repo for LLM jailbreak☆14Sep 5, 2023Updated 2 years ago
- Debian packaging for NNCP [archived], moved to https://salsa.debian.org/go-team/packages/nncp☆14Feb 18, 2023Updated 3 years ago
- [ACL 2025] LongSafety: Evaluating Long-Context Safety of Large Language Models☆16Jun 18, 2025Updated 9 months ago
- [NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning☆11Oct 29, 2024Updated last year
- ☆35Sep 13, 2023Updated 2 years ago
- LLM手撕代码合集☆21Mar 25, 2025Updated 11 months ago
- [NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"☆11Jun 18, 2024Updated last year
- Analyzing LLM Alignment via Token distribution shift☆17Jan 26, 2024Updated 2 years ago
- ☆23May 20, 2025Updated 10 months ago
- ☆11Jun 20, 2023Updated 2 years ago
- Code for CVPR24 Paper - Resource-Efficient Transformer Pruning for Finetuning of Large Models☆12Oct 31, 2025Updated 4 months ago
- ☆16Feb 8, 2024Updated 2 years ago
- The official implementation of "ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization…☆16Feb 15, 2024Updated 2 years ago
- Official repository for "DYPLOC: Dynamic Planning of Content Using Mixed Language Models for Opinion Text Generation"☆10May 20, 2022Updated 3 years ago
- How Robust are Randomized Smoothing based Defenses to Data Poisoning? (CVPR 2021)☆14Jul 16, 2021Updated 4 years ago
- Official implementation of the paper “Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning”☆20Aug 20, 2025Updated 7 months ago