alipay / YiJian-Community
YiJian-Comunity: a full-process automated large model safety evaluation tool designed for academic research
β71Updated 3 weeks ago
Related projects β
Alternatives and complementary repositories for YiJian-Community
- Accepted by IJCAI-24 Survey Trackβ153Updated 2 months ago
- π up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.β123Updated this week
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]β156Updated last month
- Accepted by ECCV 2024β71Updated 3 weeks ago
- A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)β101Updated this week
- Submission Guide + Discussion Board for AI Singapore Global Challenge for Safe and Secure LLMs (Track 1A).β16Updated 4 months ago
- Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLMβ17Updated 4 months ago
- S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Modelsβ41Updated last week
- β18Updated 4 months ago
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Surveyβ76Updated 3 months ago
- A package that achieves 95%+ transfer attack success rate against GPT-4β12Updated 2 weeks ago
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]β154Updated 4 months ago
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking β¦β13Updated 2 weeks ago
- β65Updated 9 months ago
- [ICLR'24 Spotlight] The official codes of our work on AIGC detection: "Multiscale Positive-Unlabeled Detection of AI-Generated Texts"β105Updated 10 months ago
- Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Promptingβ12Updated 7 months ago
- β73Updated 6 months ago
- JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and further assess β¦β35Updated 3 months ago
- The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".β77Updated last week
- β31Updated 9 months ago
- β37Updated 2 months ago
- Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.β33Updated 5 months ago
- This is an official repository of ``VLAttack: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models'' (NeurIPS 2β¦β39Updated last week
- Attack to induce LLMs within hallucinationsβ104Updated 5 months ago
- [ICLR 2024 Spotlight π₯ ] - [ Best Paper Award SoCal NLP 2023 π] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modalβ¦β24Updated 5 months ago
- A survey on harmful fine-tuning attack for large language modelβ69Updated this week
- Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Modelsβ180Updated 5 months ago
- Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107β12Updated 3 months ago
- β85Updated 8 months ago
- β16Updated 5 months ago