haizelabs / redteaming-resistance-benchmarkView external linksLinks
☆50Aug 3, 2024Updated last year
Alternatives and similar repositories for redteaming-resistance-benchmark
Users that are interested in redteaming-resistance-benchmark are comparing it to the libraries listed below
Sorting:
- LLM red teaming datasets from the paper 'Student-Teacher Prompting for Red Teaming to Improve Guardrails' for the ART of Safety Workshop …☆22Oct 12, 2023Updated 2 years ago
- This repo contains a demo of adversarial strings poisoning vector database and forching specific hallucinations on RAG chatbot.☆10May 2, 2024Updated last year
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆37Feb 22, 2025Updated 11 months ago
- autoredteam: code for training models that automatically red team other language models☆15Aug 9, 2023Updated 2 years ago
- Automated Safety Testing of Large Language Models☆18Jan 31, 2025Updated last year
- ☆16May 30, 2024Updated last year
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- ☆48May 9, 2024Updated last year
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆90May 14, 2024Updated last year
- [ACL'25] UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench☆33Aug 12, 2025Updated 6 months ago
- ☆27Jul 20, 2024Updated last year
- Tree of Attacks (TAP) Jailbreaking Implementation☆118Feb 7, 2024Updated 2 years ago
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆108Mar 8, 2024Updated last year
- Run SWE-bench evaluations remotely☆56Aug 14, 2025Updated 6 months ago
- Site vitrine / backoffice de API Entreprise☆12Updated this week
- The Oyster series is a set of safety models developed in-house by Alibaba-AAIG, devoted to building a responsible AI ecosystem. | Oyster …☆59Sep 11, 2025Updated 5 months ago
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆104Apr 15, 2024Updated last year
- Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!☆349Oct 17, 2025Updated 4 months ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆191Jun 26, 2025Updated 7 months ago
- LLM Self Defense: By Self Examination, LLMs know they are being tricked☆48May 21, 2024Updated last year
- 【ACL 2024】 SALAD benchmark & MD-Judge☆170Mar 8, 2025Updated 11 months ago
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- Directly Upload this code into your Esp8266 12e development board buy changing its Host name and password, than you will be able to obser…☆11May 31, 2016Updated 9 years ago
- KeepGPU is a simple CLI app that keeps your GPUs running.☆22Dec 9, 2025Updated 2 months ago
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- ☆12Jan 11, 2026Updated last month
- [USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns☆13Mar 1, 2025Updated 11 months ago
- The Matlab/Octave code for our paper "Towards fast embedded moving horizon state-of-charge estimation for lithium-ion batteries"☆12May 21, 2024Updated last year
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated 11 months ago
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆50Dec 23, 2024Updated last year
- This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.☆90May 19, 2024Updated last year
- SecureDNA client and server components monorepo☆14Oct 20, 2025Updated 3 months ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- A Probably Private mini-course introducing AI/ML security via interactive videos and hands-on examples.☆25Dec 3, 2025Updated 2 months ago
- Android Froyo+ app to auto answer calls☆11Mar 5, 2016Updated 9 years ago
- Golang syscall firehose (programmatic strace/dtruss)☆13Nov 26, 2020Updated 5 years ago
- This project auto-instruments containerized workloads in Kubernetes with New Relic agents.☆12Updated this week
- LLM benchmarks☆13Feb 22, 2024Updated last year