wyf23187 / LLM_CDVLinks

☆25

Alternatives and similar repositories for LLM_CDV

Users that are interested in LLM_CDV are comparing it to the libraries listed below

Sorting:

xzx34 / Cross-Lingual-Pitfalls
[ACL 2025] Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models
☆42Updated 3 months ago
SproutNan / AI-Safety_SCAV
This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"
☆43Updated 9 months ago
ydyjya / LLM-IHS-Explanation
☆51Updated last year
TrustGen / TrustEval-toolkit
Toolkit for evaluating the trustworthiness of generative foundation models.
☆112Updated last week
git-disl / awesome_LLM-harmful-fine-tuning-papers
A survey on harmful fine-tuning attack for large language model
☆205Updated last week
listen0425 / Safety-Layers
code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)
☆11Updated 4 months ago
hzy312 / Awesome-LLM-Watermark
UP-TO-DATE LLM Watermark paper. 🔥🔥🔥
☆354Updated 8 months ago
inspire-group / RobustRAG
☆19Updated 11 months ago
microsoft / ValueCompass
☆25Updated 10 months ago
thu-coai / JailbreakDefense_GoalPriority
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆27Updated last year
wonderNefelibata / Awesome-LRM-Safety
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …
☆69Updated last week
sleeepeer / PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
☆183Updated 6 months ago
Dtc7w3PQ / Visco-Attack
☆19Updated last week
isXinLiu / MM-SafetyBench
Accepted by ECCV 2024
☆149Updated 10 months ago
niconi19 / LLM-Conversation-Safety
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
☆106Updated last year
chen37058 / Red-Team-Arxiv-Paper-Update
Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)
☆53Updated last week
NY1024 / Foundation-Model-Paper-Notes
☆60Updated 3 months ago
Xianjun-Yang / Awesome_papers_on_LLMs_detection
The lastest paper about detection of LLM-generated text and code
☆276Updated 2 months ago
CS-BAOYAN / CSInternship2025
☆59Updated last month
ZhiningLiu1998 / SelfElicit
[ACL'25 Main] SelfElicit: Your Language Model Secretly Knows Where is the Relevant Evidence! | 让你的LLM更好地利用上下文文档：一个基于注意力的简单方案
☆23Updated 6 months ago
OpenSafetyLab / SALAD-BENCH
【ACL 2024】 SALAD benchmark & MD-Judge
☆158Updated 5 months ago
StarDewXXX / Awesome-Hybrid-CoT-Reasoning
☆51Updated 2 months ago
AmourWaltz / Reliable-LLM
☆157Updated 11 months ago
Jihuai-wpy / InferAligner
☆34Updated 11 months ago
cloudygoose / MiniAgents
The MiniAgents visualization tool for simulacra.
☆17Updated last year
AI45Lab / ActorAttack
☆101Updated 7 months ago
MurrayTom / SG-Bench
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
☆20Updated 9 months ago
PKU-YuanGroup / Reasoning-Attack
☆135Updated 6 months ago
wbopan / safety-residual-space
☆16Updated 5 months ago
jianghoucheng / AlphaEdit
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)
☆314Updated last month