thu-coai/LongSafety

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/thu-coai/LongSafety)

thu-coai / LongSafety

[ACL 2025] LongSafety: Evaluating Long-Context Safety of Large Language Models

☆16

Alternatives and similar repositories for LongSafety

Users that are interested in LongSafety are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thu-coai / BARREL
View on GitHub
[ICLR 2026] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
☆18May 21, 2025Updated last year
thu-coai / Backdoor-Data-Extraction
View on GitHub
☆33May 22, 2025Updated last year
thu-coai / SafeUnlearning
View on GitHub
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
☆32Jul 9, 2024Updated 2 years ago
yangjunx21 / Paper-Pulse
View on GitHub
Focused Papers, Delivered Simply ：）
☆55Dec 25, 2025Updated 7 months ago
thu-coai / VPO
View on GitHub
☆25Jul 20, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
jiah-li / magic
View on GitHub
The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.
☆15Dec 16, 2024Updated last year
alphadl / SafeLLM_with_IntentionAnalysis
View on GitHub
Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting
☆21Mar 25, 2024Updated 2 years ago
thu-coai / AISafetyLab
View on GitHub
AISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.
☆248Apr 21, 2026Updated 3 months ago
dayu11 / Availability-Attacks-Create-Shortcuts
View on GitHub
☆10Jul 28, 2022Updated 3 years ago
TaiMingLu / know-dont-tell
View on GitHub
☆19Oct 14, 2024Updated last year
TrustAI-laboratory / Many-Shot-Jailbreaking-Demo
View on GitHub
Research on "Many-Shot Jailbreaking" in Large Language Models (LLMs). It unveils a novel technique capable of bypassing the safety mechan…
☆17Aug 6, 2024Updated last year
TomSheng21 / AdaptGuard
View on GitHub
ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation
☆11Dec 23, 2023Updated 2 years ago
CHATS-lab / LLMs_Encode_Harmfulness_Refusal_Separately
View on GitHub
☆41Jul 3, 2026Updated 3 weeks ago
thu-coai / JPS
View on GitHub
[MM'25] JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering
☆22Dec 23, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
chtmp223 / suri
View on GitHub
Suri: Multi-constraint instruction following for long-form text generation [EMNLP’24]
☆27Oct 3, 2025Updated 9 months ago
Freder-chen / ReasonGenRM
View on GitHub
A simple implementation of ReasonGenRM.
☆19Apr 21, 2025Updated last year
zcrwind / PREFER
View on GitHub
☆22Dec 9, 2023Updated 2 years ago
ssbuild / aigc_evals
View on GitHub
aigc evals
☆10Dec 2, 2023Updated 2 years ago
konpanousis / Adversarial-LWTA-AutoAttack
View on GitHub
☆12May 6, 2022Updated 4 years ago
bebr2 / THUComputerGraphics
View on GitHub
2022春季学期清华大学计算机图形学大作业
☆12Mar 4, 2023Updated 3 years ago
sooonwoo / CL-Baselines
View on GitHub
This is a Pytorch implementation of contrastive Learning(CL) baselines.
☆14Aug 29, 2022Updated 3 years ago
OPTML-Group / Unlearn-Trace
View on GitHub
[ICLR26] Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
☆24Apr 8, 2026Updated 3 months ago
thu-coai / ShieldLM
View on GitHub
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]
☆231Sep 29, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
THU-KEG / OpenSAE
View on GitHub
☆49Apr 12, 2026Updated 3 months ago
guanjiyang / SAC
View on GitHub
☆18Oct 7, 2022Updated 3 years ago
thu-coai / Agent-SafetyBench
View on GitHub
☆149Aug 11, 2025Updated 11 months ago
AI45Lab / VLSBench
View on GitHub
[ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety
☆62Jul 21, 2025Updated last year
bebr2 / RACE
View on GitHub
Code for RACE.
☆15Nov 12, 2025Updated 8 months ago
saferlhf-v / saferlhf-v
View on GitHub
☆23Jun 16, 2025Updated last year
sheep333c / DIVE
View on GitHub
DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use
☆27Mar 13, 2026Updated 4 months ago
zdou0830 / crosslingual_summarization_semantic
View on GitHub
☆10Jun 13, 2020Updated 6 years ago
git-disl / awesome_LLM-harmful-fine-tuning-papers
View on GitHub
A survey on harmful fine-tuning attack for large language model (ACM CSUR)
☆247Jun 22, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
facebookresearch / HalluLens
View on GitHub
Codebase for LLM Textual Hallucination Benchmark
☆84Apr 25, 2025Updated last year
montehoover / DynaGuard
View on GitHub
Code for "DynaGuard: A Dynamic Guardrail Model With User-Defined Policies."
☆23Nov 3, 2025Updated 8 months ago
vivekvar-dl / GSPO-DeepSeek-R1-Distill-Qwen-1.5B
View on GitHub
☆18Mar 15, 2026Updated 4 months ago
aptsunny / Ensemble-One-Shot-NAS
View on GitHub
Automated neural architecture search algorithms implemented in PyTorch and Autogluon toolkit.
☆12Apr 17, 2020Updated 6 years ago
sleeepeer / PISanitizer
View on GitHub
PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
☆18Dec 10, 2025Updated 7 months ago
nehemya / Algo-Trade-Adversarial-Examples
View on GitHub
todo: desc
☆11Aug 12, 2021Updated 4 years ago
cordercorder / nmt-multi
View on GitHub
Codebase for multilingual neural machine translation
☆13Nov 24, 2022Updated 3 years ago