thu-coai/AISafetyLab

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/thu-coai/AISafetyLab)

thu-coai / AISafetyLab

AISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.

☆248

Alternatives and similar repositories for AISafetyLab

Users that are interested in AISafetyLab are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

leileqiTHU / Attacker
View on GitHub
The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1
☆13Apr 23, 2025Updated last year
thu-coai / TransferAttack
View on GitHub
[ACL 2025] Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
☆19May 23, 2025Updated last year
Beijing-AISI / panda-guard
View on GitHub
Panda Guard is designed for researching jailbreak attacks, defenses, and evaluation algorithms for large language models (LLMs).
☆68Mar 23, 2026Updated 3 months ago
thu-coai / LongSafety
View on GitHub
[ACL 2025] LongSafety: Evaluating Long-Context Safety of Large Language Models
☆16Jun 18, 2025Updated last year
yangjunx21 / Paper-Pulse
View on GitHub
Focused Papers, Delivered Simply ：）
☆55Dec 25, 2025Updated 6 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
s3ndd / sen-graphql-go
View on GitHub
☆80Jun 8, 2025Updated last year
wenhaoli-xmu / seco
View on GitHub
☆163Nov 16, 2025Updated 8 months ago
monshunter / goat
View on GitHub
GOAT (Golang Application Tracing) - A high-performance code tracing tool for gray releases in Go applications, featuring automatic increm…
☆67Jun 2, 2025Updated last year
JusperLee / AudioTrust
View on GitHub
AudioTrust: Benchmarking the Multi-faceted Trustworthiness of Audio Large Language Models
☆215Jan 28, 2026Updated 5 months ago
Irreel / AnyActions
View on GitHub
☆132Feb 15, 2025Updated last year
CoderLineChan / SwiftlyUI
View on GitHub
UIKit Plus: Infusing SwiftUI-like Development Efficiency. Revolutionizing UIKit development through chain syntax, resultBuilder, and mode…
☆261Apr 15, 2026Updated 3 months ago
wicai24 / DOOR-Alignment
View on GitHub
☆20Apr 7, 2025Updated last year
lyanlin96 / Application-Security-Ingress-Controller
View on GitHub
☆277Apr 29, 2025Updated last year
Unispac / shallow-vs-deep-alignment
View on GitHub
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆187Apr 23, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
kelvinfkr / adaptive-strategies-for-climate-change-adaptation-An-application-for-flood-risk-management
View on GitHub
data and codes for adaptive strategies for climate change adaptation: An application for flood risk management
☆134Feb 13, 2025Updated last year
Tele-EVOL / TeleAI-Safety
View on GitHub
☆27Jan 5, 2026Updated 6 months ago
thu-coai / BARREL
View on GitHub
[ICLR 2026] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
☆18May 21, 2025Updated last year
360CVGroup / WISA
View on GitHub
World Simulator Assistant for Physics-Aware Text-to-Video Generation
☆276Sep 22, 2025Updated 9 months ago
rainbowyuyu / manim_extend_rainbow
View on GitHub
Improvements to animations based on Manim, designed to facilitate the demonstration of algorithms in data structures, operating systems, …
☆206Dec 15, 2025Updated 7 months ago
wonderNefelibata / Awesome-LRM-Safety
View on GitHub
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …
☆84Updated this week
JinfengHU99 / inappropriate_behavior_chatbot
View on GitHub
Study of the optimization of chatbot behavior based on LLMs in the face of inappropriate behaviors in French conversations using semantic…
☆37Dec 9, 2024Updated last year
ShuaiLyu0110 / HACAN
View on GitHub
HACAN: Hybrid Attention-Driven Cross-Layer Alignment Network for Image-Text Retrieval
☆79Apr 30, 2025Updated last year
nonamev-ls / SCIE_MCE
View on GitHub
Major Color Extract using SWASA and S-CIELAB
☆231Jun 7, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
s3ndd / cryptor
View on GitHub
`cryptor` is a Go package for secure encryption and decryption using NaCl's `secretbox` from `golang.org/x/crypto`
☆60Jun 8, 2025Updated last year
CryptoAILab / misalignment
View on GitHub
[NDSS'25] The official implementation of safety misalignment.
☆19Jan 8, 2025Updated last year
CryptoAILab / FigStep
View on GitHub
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆211Jun 26, 2025Updated last year
GaohaoZhou-ops / JetsonYoloROS
View on GitHub
This repository implements Yolo functionality using TensorRT and CUDA acceleration on Nvidia Jetson devices and the ROS framework.
☆205Aug 14, 2025Updated 11 months ago
fefergrgrgrg / smileyCoin
View on GitHub
simple web ui to manage mcp (model context protocol) servers in the claude app
☆103May 16, 2025Updated last year
fefergrgrgrg / smileyCoinDev
View on GitHub
☆15May 16, 2025Updated last year
jiaxiaojunQAQ / I-GCG
View on GitHub
Improved techniques for optimization-based jailbreaking on large language models (ICLR2025)
☆146Apr 7, 2025Updated last year
kaitoInfra / fast-twitter-api
View on GitHub
Simple yet powerful Twitter data retrieval SDK with multi-language support.No Limits, No Auth Required
☆183May 28, 2026Updated last month
thu-coai / JailbreakDefense_GoalPriority
View on GitHub
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆29Jul 9, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
thu-coai / SafeUnlearning
View on GitHub
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
☆32Jul 9, 2024Updated 2 years ago
WangCheng0116 / Awesome-LRMs-Safety
View on GitHub
Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning …
☆90Aug 25, 2025Updated 10 months ago
ShuaiLyu0110 / SQL-o1
View on GitHub
SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL
☆197May 23, 2025Updated last year
wenlongliaoEE / loadforecast
View on GitHub
☆105Jan 24, 2025Updated last year
thu-coai / ShieldLM
View on GitHub
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]
☆231Sep 29, 2024Updated last year
SproutNan / AI-Safety_Benchmark
View on GitHub
The official repository for guided jailbreak benchmark
☆31Jul 28, 2025Updated 11 months ago
THESIS-AGENT / AIRouter
View on GitHub
🚀 AIRouter - 智能AI路由器：为多个LLM提供商提供统一API接口，支持负载均衡、故障转移和智能路由 | Intelligent AI Router with unified API interface, load balancing, and smart r…
☆179Aug 28, 2025Updated 10 months ago