ydyjya / Awesome-LLM-SafetyLinks
A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide researchers, practitioners, and enthusiasts with insights into the safety implications, challenges, and advancements surrounding these powerful models.
☆1,580Updated last week
Alternatives and similar repositories for Awesome-LLM-Safety
Users that are interested in Awesome-LLM-Safety are comparing it to the libraries listed below
Sorting:
- A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).☆1,649Updated last week
- Papers and resources related to the security and privacy of LLMs 🤖☆533Updated 3 months ago
- A curation of awesome tools, documents and projects about LLM Security.☆1,380Updated 3 weeks ago
- 😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.☆383Updated 2 weeks ago
- A resource repository for machine unlearning in large language models☆481Updated last month
- awesome papers in LLM interpretability☆553Updated 3 weeks ago
- Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large …☆1,043Updated 9 months ago
- An easy-to-use Python framework to generate adversarial jailbreak prompts.☆715Updated 5 months ago
- UP-TO-DATE LLM Watermark paper. 🔥🔥🔥☆354Updated 9 months ago
- Must-read Papers on Knowledge Editing for Large Language Models.☆1,152Updated 2 months ago
- [ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…☆378Updated 7 months ago
- An Awesome Collection for LLM Survey☆378Updated 3 months ago
- Safety at Scale: A Comprehensive Survey of Large Model Safety☆191Updated 6 months ago
- Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个☆1,171Updated last year
- "他山之石、可以攻玉":复旦白泽智能发布面向国内开源和国外商用大模型的Demo数据集JADE-DB☆450Updated 2 months ago
- Accepted by IJCAI-24 Survey Track☆214Updated last year
- [ICML 2024] TrustLLM: Trustworthiness in Large Language Models☆594Updated 2 months ago
- [USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models☆187Updated 6 months ago
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts☆530Updated 11 months ago
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]☆414Updated 5 months ago
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆250Updated last month
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey☆106Updated last year
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]☆208Updated 11 months ago
- A survey on harmful fine-tuning attack for large language model☆206Updated this week
- Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, data…☆905Updated last week
- This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicit…☆1,167Updated 6 months ago
- SecProbe:任务驱动式大模型安全能力评测系统☆14Updated 9 months ago
- MarkLLM: An Open-Source Toolkit for LLM Watermarking.(EMNLP 2024 System Demonstration)☆609Updated this week
- BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models☆211Updated last month
- The lastest paper about detection of LLM-generated text and code☆277Updated 2 months ago