Trust4AI / ASTRALLinks

Automated Safety Testing of Large Language Models

☆17

Alternatives and similar repositories for ASTRAL

Users that are interested in ASTRAL are comparing it to the libraries listed below

Sorting:

AIM-Intelligence / Automated-Multi-Turn-Jailbreaks
☆94Updated 11 months ago
sunblaze-ucb / cybergym
CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on…
☆86Updated last month
llm-platform-security / SecGPT
An Execution Isolation Architecture for LLM-Based Agentic Systems
☆97Updated 9 months ago
controllability / jailbreak-evaluation
The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.
☆27Updated last year
CryptoAILab / JailbreakEval
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
☆172Updated 7 months ago
andyzorigin / cybench
☆165Updated 4 months ago
briland / LLM-security-and-privacy
LLM security and privacy
☆51Updated last year
LostOxygen / llm-confidentiality
Whispers in the Machine: Confidentiality in Agentic Systems
☆41Updated this week
uiuc-kang-lab / InjecAgent
☆89Updated last year
invariantlabs-ai / invariant-gateway
LLM proxy to observe and debug what your AI agents are doing.
☆52Updated 3 months ago
ZenGuard-AI / fast-llm-security-guardrails
The fastest Trust Layer for AI Agents
☆144Updated 5 months ago
yuchen814 / CodeHalu
☆17Updated last year
HumanCompatibleAI / tensor-trust
A prompt injection game to collect data for robust ML research
☆65Updated 9 months ago
Libr-AI / OpenRedTeaming
Papers about red teaming LLMs and Multimodal models.
☆152Updated 5 months ago
liu00222 / Open-Prompt-Injection
This repository provides a benchmark for prompt injection attacks and defenses
☆330Updated last week
microsoft / TaskTracker
TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…
☆74Updated 2 months ago
jpmorganchase / CyberBench
CyberBench: A Multi-Task Cyber LLM Benchmark
☆24Updated 6 months ago
CS-EVAL / CS-Eval
CS-Eval is a comprehensive evaluation suite for fundamental cybersecurity models or large language models' cybersecurity ability.
☆54Updated 11 months ago
XHMY / AutoDefense
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
☆56Updated 5 months ago
AI-secure / AgentPoison
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
☆163Updated 6 months ago
ethz-spylab / agentdojo
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
☆340Updated last week
thu-coai / Backdoor-Data-Extraction
☆29Updated 5 months ago
facebookresearch / SecAlign
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
☆74Updated 3 months ago
SproutNan / AI-Safety_Benchmark
The official repository for guided jailbreak benchmark
☆24Updated 3 months ago
declare-lab / ferret
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
☆18Updated last year
PurCL / ASTRA
🥇 Amazon Nova AI Challenge Winner - ASTRA emerged victorious as the top attacking team in Amazon's global AI safety competition, defeati…
☆62Updated 2 months ago
morpheuslord / CVE-llm_dataset
This is a dataset intended to train a LLM model for a completely CVE focused input and output.
☆63Updated 4 months ago
agiresearch / ASB
Agent Security Bench (ASB)
☆141Updated last week
lve-org / lve
A repository of Language Model Vulnerabilities and Exposures (LVEs).
☆112Updated last year
RobustNLP / DeRTa
A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.
☆70Updated 5 months ago