Babelscape / ALERT
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
☆33Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for ALERT
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆82Updated 6 months ago
- ☆128Updated this week
- Weak-to-Strong Jailbreaking on Large Language Models☆67Updated 9 months ago
- ☆112Updated last month
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆57Updated 9 months ago
- Evaluating LLMs with fewer examples☆135Updated 7 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆74Updated this week
- [ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"☆62Updated 8 months ago
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆64Updated 10 months ago
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; arXiv preprint arXiv:2403.…☆37Updated 4 months ago
- ☆13Updated 3 months ago
- ☆56Updated 9 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆71Updated 6 months ago
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆76Updated 9 months ago
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions☆40Updated 4 months ago
- A package dedicated for running benchmark agreement testing☆13Updated this week
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆79Updated 8 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆62Updated 5 months ago
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆24Updated 2 months ago
- [ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability☆110Updated 2 months ago
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆144Updated last month
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆83Updated 4 months ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆95Updated last month
- 【ACL 2024】 SALAD benchmark & MD-Judge☆106Updated last month
- Papers about red teaming LLMs and Multimodal models.☆78Updated this week
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 9 months ago
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆26Updated 2 weeks ago
- InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆54Updated last week