Trust4AI / ASTRALLinks
Automated Safety Testing of Large Language Models
☆15Updated 4 months ago
Alternatives and similar repositories for ASTRAL
Users that are interested in ASTRAL are comparing it to the libraries listed below
Sorting:
- ☆63Updated 11 months ago
- ☆71Updated 6 months ago
- Whispers in the Machine: Confidentiality in Agentic Systems☆37Updated 2 weeks ago
- ☆32Updated 3 weeks ago
- ☆20Updated 2 weeks ago
- LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models☆20Updated last week
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆52Updated 9 months ago
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆158Updated 2 months ago
- A prompt injection game to collect data for robust ML research☆61Updated 4 months ago
- An Execution Isolation Architecture for LLM-Based Agentic Systems☆80Updated 4 months ago
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆16Updated 9 months ago
- Risks and targets for assessing LLMs & LLM vulnerabilities☆30Updated last year
- CS-Eval is a comprehensive evaluation suite for fundamental cybersecurity models or large language models' cybersecurity ability.☆43Updated 6 months ago
- ☆80Updated last month
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆48Updated 2 months ago
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆42Updated 8 months ago
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆32Updated 5 months ago
- [ICML 2024] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast☆101Updated last year
- Agent Security Bench (ASB)☆81Updated last month
- LLM security and privacy☆49Updated 7 months ago
- LLM Self Defense: By Self Examination, LLMs know they are being tricked☆34Updated last year
- Attack to induce LLMs within hallucinations☆154Updated last year
- ☆45Updated last year
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆135Updated 6 months ago
- [ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models☆76Updated last month
- ☆40Updated 8 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆60Updated 2 weeks ago
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆76Updated 3 weeks ago
- This repository provides a benchmark for prompt Injection attacks and defenses☆216Updated this week
- The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.☆23Updated 7 months ago