S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models
☆111Feb 13, 2026Updated last month
Alternatives and similar repositories for S-Eval
Users that are interested in S-Eval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆20May 31, 2024Updated last year
- Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.☆63May 21, 2024Updated last year
- SC-Safety: 中文大模型多轮对抗安全基准☆149Mar 15, 2024Updated 2 years ago
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆25Nov 29, 2024Updated last year
- Consuming Resrouce via Auto-generation for LLM-DoS Attack under Black-box Settings☆19Sep 1, 2025Updated 7 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A curated list of awesome publications and researchers on prompting framework updated and maintained by The Intelligent System Security (…☆87Jan 14, 2025Updated last year
- ☆10Mar 13, 2023Updated 3 years ago
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆281Jul 28, 2025Updated 8 months ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆172Mar 8, 2025Updated last year
- AISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.☆237Aug 29, 2025Updated 7 months ago
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts☆576Feb 27, 2026Updated last month
- Code release for RobOT (ICSE'21)☆15Dec 5, 2022Updated 3 years ago
- Code release for DeepJudge (S&P'22)☆52Mar 14, 2023Updated 3 years ago
- LLM evaluation.☆16Nov 7, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆27Feb 1, 2023Updated 3 years ago
- [S&P 2026] SoK: Evaluating Jailbreak Guardrails for Large Language Models☆37Dec 17, 2025Updated 3 months ago
- White-box Fairness Testing through Adversarial Sampling☆14Apr 16, 2021Updated 4 years ago
- The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1☆13Apr 23, 2025Updated 11 months ago
- "他山之石、可以攻玉":复旦白泽智能发布面向国内开源和国外商用大模型的Demo数据集JADE-DB☆505Nov 18, 2025Updated 4 months ago
- ☆35Jan 7, 2025Updated last year
- The code implementation of MuScleLoRA (Accepted in ACL 2024)☆10Dec 1, 2024Updated last year
- Code to enable layer-level steering in LLMs using sparse auto encoders☆31Sep 18, 2025Updated 6 months ago
- ☆22Jan 14, 2025Updated last year
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs☆321Jun 7, 2024Updated last year
- Instruction Following Eval☆16Jan 16, 2025Updated last year
- Implementation of TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems (https://arxiv.org/pdf/190…☆19Apr 13, 2023Updated 2 years ago
- SuperCLUE-Agent: 基于中文原生任务的Agent智能体核心能力测评基准☆94Nov 9, 2023Updated 2 years ago
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆101Jan 11, 2026Updated 3 months ago
- The offical code for paper "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization"☆10Jun 23, 2024Updated last year
- Accepted by ECCV 2024☆198Oct 15, 2024Updated last year
- Tzer: TVM Implementation of "Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation (OOPSLA'22)“.☆12Jan 15, 2022Updated 4 years ago
- ☆48Feb 25, 2026Updated last month
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆19Jun 27, 2021Updated 4 years ago
- Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"☆14Nov 22, 2024Updated last year
- ☆11Jan 3, 2024Updated 2 years ago
- Code for paper: AdvKnn: Adversarial Attacks On K-Nearest Neighbor Classifiers With Approximate Gradients☆14Dec 23, 2019Updated 6 years ago
- [ACL 2026 Findings] CoV: Chain-of-View Prompting for Spatial Reasoning☆52Updated this week
- ☆27Jun 5, 2024Updated last year
- [ACL 2025] Can MLLMs Understand the Deep Implication Behind Chinese Images?☆21Oct 20, 2025Updated 5 months ago