STAIR-BUPT / JailBench
JailBench:大型语言模型越狱攻击风险评测中文数据集
☆20Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for JailBench
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆154Updated 4 months ago
- SC-Safety: 中文大模型多轮对抗安全基准☆105Updated 7 months ago
- Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.☆33Updated 5 months ago
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]☆156Updated last month
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆38Updated 4 months ago
- The code and resource of "Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmark"…☆51Updated 2 weeks ago
- A curated reading list for large language model (LLM) alignment. Take a look at our new survey "Large Language Model Alignment: A Survey"…☆71Updated last year
- ☆23Updated 2 weeks ago
- SeqXGPT: An advance method for sentence-level AI-generated text detection.☆75Updated last year
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆60Updated last month
- ☆88Updated 2 months ago
- Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models☆23Updated last year
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆83Updated last month
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey☆76Updated 3 months ago
- ☆71Updated 10 months ago
- LAiW: A Chinese Legal Large Language Models Benchmark☆72Updated 4 months ago
- The code and resource of "Towards Comprehensive Detection of Chinese Harmful Memes" (NeurIPS2024 D&B).☆21Updated this week
- 【ACL 2024】 SALAD benchmark & MD-Judge☆103Updated last month
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆97Updated last week
- [NLPCC 2024] Shared Task 10: Regulating Large Language Models☆13Updated 5 months ago
- ☆61Updated 2 months ago
- 中文大语言模型评测第三期☆24Updated 5 months ago
- Paper list and datasets for the paper: A Survey on Data Selection for LLM Instruction Tuning☆32Updated 9 months ago
- Official Repo of paper "KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction". In the paper, we propose …☆58Updated 3 months ago
- Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM☆17Updated 4 months ago
- Accompanying repo for the DP2O paper accepted by AAAI 2024 main conference☆14Updated 7 months ago
- ☆48Updated 8 months ago
- 基于DPO算法微调语言大模型,简单好上手。☆28Updated 4 months ago
- ☆91Updated 11 months ago
- Code for paper "Defending aginast LLM Jailbreaking via Backtranslation"☆24Updated 2 months ago