fyzhang1 / OblivionisLinks
Official Repository for "Oblivionis: A Lightweight Learning and Unlearning Framework for Federated Large Language Models"
☆28Updated 2 months ago
Alternatives and similar repositories for Oblivionis
Users that are interested in Oblivionis are comparing it to the libraries listed below
Sorting:
- ☆12Updated 3 months ago
 - Code and data for paper "Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?". (ACL 2025 Main)☆17Updated 4 months ago
 - To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models☆32Updated 5 months ago
 - SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆23Updated 11 months ago
 - ☆42Updated 5 months ago
 - Source code of paper "An Unforgeable Publicly Verifiable Watermark for Large Language Models" accepted by ICLR 2024☆34Updated last year
 - ☆32Updated 6 months ago
 - [EMNLP 24] Official Implementation of CLEANGEN: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models☆20Updated 7 months ago
 - ☆42Updated 7 months ago
 - This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"☆46Updated 3 weeks ago
 - ☆20Updated last year
 - [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆29Updated last year
 - Code and dataset for the paper: "Can Editing LLMs Inject Harm?"☆21Updated 11 months ago
 - ☆63Updated 7 months ago
 - Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …☆76Updated last week
 - Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"☆56Updated 9 months ago
 - This repository contains the source code, datasets, and scripts for the paper "GenderCARE: A Comprehensive Framework for Assessing and Re…☆25Updated last year
 - Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107☆17Updated last year
 - Official Code for ACL 2024 paper "GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis"☆60Updated last year
 - ☆22Updated 2 months ago
 - [ICLR 2025] Dissecting adversarial robustness of multimodal language model agents☆110Updated 8 months ago
 - ☆21Updated last year
 - ☆26Updated 8 months ago
 - Code and data for paper "A Semantic Invariant Robust Watermark for Large Language Models" accepted by ICLR 2024.☆34Updated 11 months ago
 - ☆32Updated 6 months ago
 - ☆109Updated 9 months ago
 - Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)☆13Updated last year
 - Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks☆32Updated last year
 - ☆21Updated last year
 - Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models☆26Updated 3 weeks ago