chziakas / redevalLinks
A library for red-teaming LLM applications with LLMs.
☆28Updated 11 months ago
Alternatives and similar repositories for redeval
Users that are interested in redeval are comparing it to the libraries listed below
Sorting:
- Red-Teaming Language Models with DSPy☆213Updated 7 months ago
- ☆148Updated 3 months ago
- ☆86Updated 10 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆96Updated 5 months ago
- ☆29Updated 4 months ago
- ☆25Updated 11 months ago
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆45Updated last year
- The fastest Trust Layer for AI Agents☆145Updated 3 months ago
- ☆34Updated 10 months ago
- A prompt injection game to collect data for robust ML research☆63Updated 7 months ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆50Updated 5 months ago
- Code for the paper "Fishing for Magikarp"☆165Updated 4 months ago
- Sphynx Hallucination Induction☆53Updated 7 months ago
- Papers about red teaming LLMs and Multimodal models.☆139Updated 3 months ago
- LMAP (large language model mapper) is like NMAP for LLM, is an LLM Vulnerability Scanner and Zero-day Vulnerability Fuzzer.☆24Updated 11 months ago
- Here Comes the AI Worm: Preventing the Propagation of Adversarial Self-Replicating Prompts Within GenAI Ecosystems☆205Updated 2 weeks ago
- Code to break Llama Guard☆32Updated last year
- Collection of evals for Inspect AI☆233Updated this week
- General research for Dreadnode☆25Updated last year
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆114Updated last year
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to a…☆420Updated last year
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆115Updated last year
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated 10 months ago
- Test your AI model's security through CLI☆30Updated last week
- ☆48Updated last year
- This project investigates the security of large language models by performing binary classification of a set of input prompts to discover…☆48Updated last year
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆18Updated last year
- ☆19Updated last year
- A benchmark for prompt injection detection systems.☆136Updated 3 weeks ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆75Updated 9 months ago