kstanghere / GenderCARE-ccs24
This repository contains the source code, datasets, and scripts for the paper "GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models," accepted by ACM CCS 2024 (camera-ready version under preparation).
☆16Updated 5 months ago
Alternatives and similar repositories for GenderCARE-ccs24:
Users that are interested in GenderCARE-ccs24 are comparing it to the libraries listed below
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆58Updated this week
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆104Updated last month
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆53Updated 2 weeks ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆53Updated 4 months ago
- BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models☆97Updated 2 weeks ago
- Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks☆22Updated 6 months ago
- Code for watermarking language models☆76Updated 4 months ago
- ☆27Updated last month
- Python package for measuring memorization in LLMs.☆137Updated 2 months ago
- ☆54Updated last month
- ☆70Updated last week
- 【ACL 2024】 SALAD benchmark & MD-Judge☆119Updated last month
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆83Updated 4 months ago
- [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…☆45Updated 6 months ago
- ☆17Updated 3 months ago
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆90Updated this week
- SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors☆39Updated 7 months ago
- Accepted by ECCV 2024☆92Updated 3 months ago
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆65Updated 4 months ago
- ☆161Updated last year
- Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…☆18Updated 6 months ago
- Code and data for paper "A Semantic Invariant Robust Watermark for Large Language Models" accepted by ICLR 2024.☆26Updated 2 months ago
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"☆76Updated last year
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆67Updated 6 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆76Updated 8 months ago
- ☆45Updated 6 months ago
- [Arxiv 2024] Dissecting Adversarial Robustness of Multimodal LM Agents☆54Updated 2 weeks ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆113Updated 6 months ago
- A collection of automated evaluators for assessing jailbreak attempts.☆102Updated this week
- ☆106Updated last month