RockyHHH / Safety-Evaluating
本文提出了一个基于“文心一言”的中国LLMs的安全评估基准,其中包括8种典型的安全场景和6种指令攻击类型。此外,本文还提出了安全评估的框架和过程,利用手动编写和收集开源数据的测试Prompts,以及人工干预结合利用LLM强大的评估能力作为“共同评估者”。
☆20Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Safety-Evaluating
- SC-Safety: 中文大模型多轮对抗安全基准☆107Updated 8 months ago
- 基于ChatGPT构建的中文self-instruct数据集☆113Updated last year
- Source code for ACL 2023 paper Decoder Tuning: Efficient Language Understanding as Decoding☆48Updated last year
- 一套代码指令微调大模型☆37Updated last year
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆159Updated 5 months ago
- 1.4B sLLM for Chinese and English - HammerLLM🔨☆43Updated 7 months ago
- ☆23Updated last year
- make LLM easier to use☆58Updated last year
- ☆20Updated 4 months ago
- ☆93Updated 8 months ago
- Chinese Generation Evaluation☆12Updated last year
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆67Updated last week
- GoGPT中文指令数据集构造☆10Updated 9 months ago
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]☆158Updated last month
- Official Repository for SIGIR2024 Demo Paper "An Integrated Data Processing Framework for Pretraining Foundation Models"☆56Updated 2 months ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆62Updated last year
- 中文通用大模型开放域多轮测评基准 | An Open Domain Benchmark for Foundation Models in Chinese☆76Updated last year
- 百川Dynamic NTK-ALiBi的代码实现:无需微调即可推理更长文本☆46Updated last year
- 通用简单工具项目☆14Updated last month
- “悟道”数据☆39Updated 3 years ago
- Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.☆33Updated 6 months ago
- 中文大语言模型评测第二期☆70Updated last year
- ☆40Updated 5 months ago
- ☆129Updated 4 months ago
- 代码大模型 预训练&微调&DPO 数据处理 业界处理pipeline sota☆27Updated 4 months ago
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆27Updated 3 months ago
- JailBench:大型语言模型越狱攻击风险评测中文数据集☆22Updated 4 months ago
- This repository open-sources our GEC system submitted by THU KELab (sz) in the CCL2023-CLTC Track 1: Multidimensional Chinese Learner Tex…☆14Updated 11 months ago
- 本项目采用BERT等预训练模型实现多项选择型阅读理解任务(Multiple Choice MRC)☆15Updated 3 years ago