STAIR-BUPT / STAIR-LLMGuardrailsLinks

☆12

Alternatives and similar repositories for STAIR-LLMGuardrails

Users that are interested in STAIR-LLMGuardrails are comparing it to the libraries listed below

Sorting:

YancyKahn / CoA
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
☆38Updated 10 months ago
xingjunm / Awesome-Large-Model-Safety
Safety at Scale: A Comprehensive Survey of Large Model Safety
☆204Updated 9 months ago
IS2Lab / S-Eval
S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models
☆104Updated last month
AI45Lab / ActorAttack
☆111Updated 9 months ago
ltroin / llm_attack_defense_arena
☆82Updated 2 months ago
niconi19 / LLM-Conversation-Safety
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
☆107Updated last year
NY1024 / Foundation-Model-Paper-Notes
☆69Updated 6 months ago
thu-coai / ShieldLM
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]
☆216Updated last year
thu-coai / SafetyBench
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
☆261Updated 3 months ago
DPamK / BadAgent
☆24Updated 8 months ago
SproutNan / AI-Safety_SCAV
This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"
☆46Updated last month
isXinLiu / MM-SafetyBench
Accepted by ECCV 2024
☆175Updated last year
isXinLiu / Awesome-MLLM-Safety
Accepted by IJCAI-24 Survey Track
☆223Updated last year
sleeepeer / PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
☆218Updated this week
agiresearch / ASB
Agent Security Bench (ASB)
☆144Updated 3 weeks ago
chen37058 / Red-Team-Arxiv-Paper-Update
Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)
☆74Updated this week
thu-coai / JailbreakDefense_GoalPriority
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆29Updated last year
ydyjya / LLM-IHS-Explanation
☆55Updated last year
huizhang-L / CodeChameleon
☆25Updated last year
kriti-hippo / red_queen
Red Queen Dataset and data generation template
☆20Updated last year
GodXuxilie / PromptAttack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
☆106Updated 10 months ago
lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆99Updated last year
CryptoAILab / FigStep
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆178Updated 4 months ago
Allen-piexl / JailbreakZoo
☆153Updated last year
Xianjun-Yang / Awesome_papers_on_LLMs_detection
The lastest paper about detection of LLM-generated text and code
☆280Updated 5 months ago
guardagent / code
☆27Updated 10 months ago
casperllm / CASPER
☆15Updated last year
tmllab / 2025_ICLR_PiF
☆37Updated 6 months ago
OSU-NLP-Group / AgentSafety
☆130Updated 3 weeks ago
alphadl / SafeLLM_with_IntentionAnalysis
Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting
☆18Updated last year