Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.
☆63May 21, 2024Updated last year
Alternatives and similar repositories for Flames
Users that are interested in Flames are comparing it to the libraries listed below
Sorting:
- ☆45Jun 19, 2025Updated 9 months ago
- S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models☆111Feb 13, 2026Updated last month
- [EMNLP 2024] ”ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models“☆26Jun 24, 2024Updated last year
- ☆30Aug 9, 2023Updated 2 years ago
- ☆21Aug 19, 2024Updated last year
- ☆15Jan 9, 2026Updated 2 months ago
- 面向中文 大模型价值观的评估与对齐研究☆555Jul 20, 2023Updated 2 years ago
- Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge☆14Feb 20, 2024Updated 2 years ago
- GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.☆39Jan 7, 2025Updated last year
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆25Nov 29, 2024Updated last year
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]☆226Sep 29, 2024Updated last year
- Open-source red teaming framework for MLLMs with 42+ attack methods☆233Updated this week
- SC-Safety: 中文大模型多轮对抗安全基准☆150Mar 15, 2024Updated 2 years ago
- ☆17Oct 15, 2023Updated 2 years ago
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆274Jul 28, 2025Updated 7 months ago
- Accepted by ECCV 2024☆193Oct 15, 2024Updated last year
- Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。☆1,136Feb 27, 2024Updated 2 years ago
- An active inference model of Lacanian psychoanalysis☆16Jun 7, 2025Updated 9 months ago
- [EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform"☆64May 16, 2025Updated 10 months ago
- ☆30Feb 16, 2024Updated 2 years ago
- ☆17Nov 3, 2024Updated last year
- ☆39Jun 25, 2025Updated 8 months ago
- Verify MAPPO in task ‘simple_spread_v3‘☆15Aug 10, 2024Updated last year
- [ICLR 2025] Official implementation for "SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanati…☆43Feb 11, 2025Updated last year
- This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"☆47Oct 13, 2025Updated 5 months ago
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆136Jun 5, 2024Updated last year
- [ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"☆25Mar 4, 2025Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Jul 8, 2024Updated last year
- A system that turns jailbreak papers into runnable attacks and benchmarks — live, as research evolves.☆24Mar 12, 2026Updated last week
- Adversarial Attack for Pre-trained Code Models☆10Jul 19, 2022Updated 3 years ago
- 复旦白泽大模型安全基准测试集(2024年夏季版)☆51Jul 31, 2024Updated last year
- BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).☆176Oct 27, 2023Updated 2 years ago
- [ISSTA'24] A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing☆12Jan 7, 2025Updated last year
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆36Oct 15, 2023Updated 2 years ago
- Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"☆75May 20, 2025Updated 10 months ago
- ☆124Feb 3, 2025Updated last year
- LaTeX thesis template for CS undergraduates, Fudan University, 2022☆20Jul 28, 2024Updated last year
- CMMLU: Measuring massive multitask language understanding in Chinese☆806Dec 6, 2024Updated last year
- FlagEval is an evaluation toolkit for AI large foundation models.☆338Apr 24, 2025Updated 10 months ago