ydyjya / Awesome-LLM-Safety
A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide researchers, practitioners, and enthusiasts with insights into the safety implications, challenges, and advancements surrounding these powerful models.
☆1,005Updated this week
Related projects ⓘ
Alternatives and complementary repositories for Awesome-LLM-Safety
- A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).☆948Updated this week
- Papers and resources related to the security and privacy of LLMs 🤖☆433Updated 2 months ago
- A curation of awesome tools, documents and projects about LLM Security.☆955Updated this week
- An Awesome Collection for LLM Survey☆310Updated 2 months ago
- Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个☆981Updated 3 months ago
- Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large …☆943Updated 2 months ago
- Must-read Papers on Knowledge Editing for Large Language Models.☆926Updated this week
- An easy-to-use Python framework to generate adversarial jailbreak prompts.☆479Updated 2 months ago
- "他山之石、可以攻玉":复旦白泽智能发布面向国内开源和国外商用大模型的Demo数据集JADE-DB☆313Updated last week
- 😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.☆133Updated last week
- The lastest paper about detection of LLM-generated text and code☆216Updated last week
- Accepted by IJCAI-24 Survey Track☆159Updated 2 months ago
- LLM hallucination paper list☆293Updated 8 months ago
- Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。☆871Updated 8 months ago
- The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".☆245Updated 3 weeks ago
- UP-TO-DATE LLM Watermark paper. 🔥🔥🔥☆293Updated this week
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆157Updated 4 months ago
- Continuously updated list of related resources for generative LLMs like GPT and their analysis and detection.☆197Updated 2 months ago
- awesome papers in LLM interpretability☆325Updated this week
- Awesome-LLM-Robustness: a curated list of Uncertainty, Reliability and Robustness in Large Language Models☆674Updated 5 months ago
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey☆76Updated 3 months ago
- Aligning Large Language Models with Human: A Survey☆700Updated last year
- [ICML 2024] TrustLLM: Trustworthiness in Large Language Models☆468Updated last month
- MarkLLM: An Open-Source Toolkit for LLM Watermarking.(EMNLP 2024 Demo)☆292Updated this week
- papers related to LLM-agent that published on top conferences☆305Updated 9 months ago
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆241Updated 8 months ago
- [ACL 2024] A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future☆338Updated 4 months ago
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]☆158Updated last month
- The repository for the survey paper <<Survey on Large Language Models Factuality: Knowledge, Retrieval and Domain-Specificity>>☆327Updated 6 months ago
- 面向中文大模型价值观的评估与对齐研究☆479Updated last year