RockyHHH / Safety-EvaluatingView external linksLinks
本文提出了一个基于“文心一言”的中国LLMs的安全评估基准,其中包括8种典型的安全场景和6种指令攻击类型。此外,本文还提出了安全评估的框架和过程,利用手动编写和收集开源数据的测试Prompts,以及人工干预结合利用LLM强大的评估能力作为“共同评估者”。
☆32Sep 1, 2023Updated 2 years ago
Alternatives and similar repositories for Safety-Evaluating
Users that are interested in Safety-Evaluating are comparing it to the libraries listed below
Sorting:
- A white box algorithm that generate adversarial examples according to the gradient☆11May 9, 2020Updated 5 years ago
- SC-Safety: 中文大模型多轮对抗安全基准☆150Mar 15, 2024Updated last year
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆272Jul 28, 2025Updated 6 months ago
- ☆11Dec 23, 2024Updated last year
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆29Jul 9, 2024Updated last year
- Full-reference image quality assessment based on convolutional activation maps.☆10Dec 24, 2020Updated 5 years ago
- ☆12May 6, 2022Updated 3 years ago
- Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。☆1,127Feb 27, 2024Updated last year
- 【ACL 2024】 SALAD benchmark & MD-Judge☆170Mar 8, 2025Updated 11 months ago
- On the Robustness of GUI Grounding Models Against Image Attacks☆12Apr 8, 2025Updated 10 months ago
- ☆14Feb 26, 2025Updated 11 months ago
- Implementation of our paper published in Springer's Signal, Image and Video Processing☆11Dec 5, 2020Updated 5 years ago
- The implementation of our IEEE S&P 2024 paper "Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples".☆11Jun 28, 2024Updated last year
- [ICLR 2022] Boosting Randomized Smoothing with Variance Reduced Classifiers☆12Mar 29, 2022Updated 3 years ago
- 该资源为杨秀璋作者《Python网络数据爬取及分析从入门到精通(分析篇)》书籍所有源代码,包括可视化分析、聚类分析、回归分析、分类分析、词云和LDA分析等内容。所有代码已修改为Python3实现,希望对您有所帮助,一起加油。☆11Aug 12, 2021Updated 4 years ago
- 蚂蚁金融自然语言处理竞赛。☆10Sep 3, 2018Updated 7 years ago
- Official Code Implementation for the CCS 2022 Paper "On the Privacy Risks of Cell-Based NAS Architectures"☆11Nov 21, 2022Updated 3 years ago
- A research workbench for developing and testing attacks against large language models, with a focus on prompt injection vulnerabilities a…☆37Updated this week
- The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.☆13Dec 16, 2024Updated last year
- Using the Python Imaging Library (PIL, now Pillow) to generate colors and animate Moiré patterns.☆15Sep 9, 2025Updated 5 months ago
- ☆11Nov 12, 2024Updated last year
- ☆10Mar 31, 2022Updated 3 years ago
- AIGC 系列报告 2022-2023☆11Feb 25, 2024Updated last year
- ☆11Jul 5, 2023Updated 2 years ago
- The Project of Our ICCV Paper☆10Nov 10, 2020Updated 5 years ago
- Beyond Words: A Multimodal Exploration of Persuasion in Memes☆13Jun 8, 2024Updated last year
- [Preprint] Backdoor Attacks on Federated Learning with Lottery Ticket Hypothesis☆10Sep 23, 2021Updated 4 years ago
- Face recognition with loss of softmax, sphereface, cosface, arcface in pytorch of python3☆10Apr 27, 2020Updated 5 years ago
- ☆10Jun 29, 2020Updated 5 years ago
- [USENIX Security 2025] SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks☆19Sep 18, 2025Updated 4 months ago
- ☆14Jan 26, 2025Updated last year
- ArxivDaily☆13Updated this week
- 百度UIE抽取模型torch版训练预测框架☆12Nov 20, 2024Updated last year
- Code for the CVPR '23 paper, "Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning"☆10Jun 9, 2023Updated 2 years ago
- 🦄 基于 React + Umi + Ant Design 的现代企业级 RBAC 权限管理系统,支持动态路由、菜单权限、操作权限控制、多语言及多环境配置,帮助你快速搭建企业级管理系统。☆18Feb 8, 2026Updated last week
- Python reuse of ViBe Source C code based on Cython. ViBe: A universal background subtraction algorithm for video sequences☆10Nov 19, 2020Updated 5 years ago
- [NeurIPS 2024] "Membership Inference on Text-to-image Diffusion Models via Conditional Likelihood Discrepancy"☆12Sep 15, 2025Updated 5 months ago
- Code for Spectral Norm of Convolutional Layers with Circular and Zero Paddings and Efficient Bound of Lipschitz Constant for Convolutiona…☆15Feb 2, 2024Updated 2 years ago
- Multi-class adaboost algorithm samme☆10Nov 8, 2019Updated 6 years ago