☆51Aug 10, 2024Updated last year
Alternatives and similar repositories for certified-llm-safety
Users that are interested in certified-llm-safety are comparing it to the libraries listed below
Sorting:
- ☆128Nov 13, 2023Updated 2 years ago
- Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing'☆23Jun 9, 2024Updated last year
- ☆15Oct 5, 2020Updated 5 years ago
- ☆20Nov 4, 2025Updated 4 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆57Aug 17, 2024Updated last year
- Code for ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models☆37Jun 1, 2025Updated 9 months ago
- ☆60Mar 9, 2023Updated 3 years ago
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆69Oct 23, 2024Updated last year
- ☆32Feb 13, 2024Updated 2 years ago
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆29Jul 9, 2024Updated last year
- ☆31Oct 7, 2021Updated 4 years ago
- Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"☆31Oct 26, 2023Updated 2 years ago
- The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…☆153Sep 2, 2025Updated 6 months ago
- ☆11Jun 20, 2023Updated 2 years ago
- Improving Alignment and Robustness with Circuit Breakers☆258Sep 24, 2024Updated last year
- ☆57May 21, 2025Updated 9 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆174Apr 23, 2025Updated 10 months ago
- [arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"☆173Feb 20, 2024Updated 2 years ago
- ☆11Sep 2, 2024Updated last year
- Detection of adversarial examples using influence functions and nearest neighbors☆37Nov 22, 2022Updated 3 years ago
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- Official implementation of the paper "RaceMOP: Mapless Online Path Planning for Multi-Agent Autonomous Racing using Residual Policy Learn…☆10Oct 23, 2024Updated last year
- ☆10Apr 25, 2024Updated last year
- Computing with Eigenvalue Distributions of Large Random Matrices of the Covariance Type☆15Feb 16, 2018Updated 8 years ago
- ☆10Mar 8, 2025Updated last year
- A Benchmark for Evaluating Safety and Trustworthiness in Web Agents for Enterprise Scenarios☆19Updated this week
- ☆14Feb 2, 2025Updated last year
- Python package for Geometric / Clifford Algebra with Pytorch.☆14Jan 25, 2026Updated last month
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆163Nov 30, 2024Updated last year
- The Code for Homeomorphic Projection☆13Sep 21, 2023Updated 2 years ago
- [USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns☆13Mar 1, 2025Updated last year
- ☆44Oct 1, 2024Updated last year
- A fast + lightweight implementation of the GCG algorithm in PyTorch☆319May 13, 2025Updated 9 months ago
- [PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and min…☆10Aug 13, 2024Updated last year
- Graphical user interface for text-guided face editing☆11Jan 18, 2023Updated 3 years ago
- ☆13Mar 16, 2025Updated 11 months ago
- Official Code For EMNLP2025 Findings: {DLPO : Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Le…☆10Dec 25, 2025Updated 2 months ago
- ☆13May 10, 2025Updated 9 months ago
- Thermal rating calculations of power transmission lines in Python.☆12Feb 3, 2021Updated 5 years ago