kstanghere / GenderCARE-ccs24
This repository contains the source code, datasets, and scripts for the paper "GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models," accepted by ACM CCS 2024 (camera-ready version under preparation).
☆20Updated 7 months ago
Alternatives and similar repositories for GenderCARE-ccs24:
Users that are interested in GenderCARE-ccs24 are comparing it to the libraries listed below
- Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…☆19Updated 8 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆58Updated 2 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆78Updated 10 months ago
- Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)☆23Updated this week
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆128Updated last month
- Python package for measuring memorization in LLMs.☆148Updated 4 months ago
- ☆18Updated last year
- ☆19Updated last month
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)