JAILJUDGE: A comprehensive evaluation benchmark which includes a wide range of risk scenarios with complex malicious prompts (e.g., synthetic, adversarial, in-the-wild, and multi-language scenarios, etc.) along with high-quality human- annotated test datasets.
☆59Dec 13, 2024Updated last year
Alternatives and similar repositories for Jailjudge
Users that are interested in Jailjudge are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆160Nov 30, 2024Updated last year
- ☆33Aug 24, 2023Updated 2 years ago
- ☆12Feb 19, 2024Updated 2 years ago
- Official codes of KDD'24 paper "HiFGL: A Hierarchical Framework for Cross-silo Cross-device Federated Graph Learning"☆10Sep 4, 2024Updated last year
- [NeurIPS2025] Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition☆32Oct 21, 2025Updated 5 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- UUKG: Unified Urban Knowledge Graph Dataset for Knowledge-Enhanced Urban Spatiotemporal Prediction☆120Apr 29, 2025Updated 11 months ago
- Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization☆22Dec 13, 2024Updated last year
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- The code implementation of MuScleLoRA (Accepted in ACL 2024)☆10Dec 1, 2024Updated last year
- The code implementation of GraCeFul (Accepted in COLING 2025)☆13Jan 27, 2025Updated last year
- ☆29Dec 19, 2025Updated 3 months ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆55Apr 4, 2025Updated 11 months ago
- ☆15Jun 7, 2024Updated last year
- [NeurIPS 2025] Bag of Tricks for Inference-time Computation of LLM Reasoning☆17Sep 20, 2025Updated 6 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆32Oct 18, 2024Updated last year
- ☆23Feb 2, 2022Updated 4 years ago
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆24Nov 29, 2024Updated last year
- ☆12Apr 13, 2017Updated 8 years ago
- ☆16Jun 3, 2025Updated 9 months ago
- 🔥🔥🔥 [NeurIPS2025] MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem☆495Feb 28, 2026Updated last month
- ☆165Sep 2, 2024Updated last year
- KITE (Knowledge-Intensive Task Evaluation) is an end-to-end benchmark for RAG pipelines☆23Aug 14, 2024Updated last year
- ☆25Nov 19, 2025Updated 4 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆62Jul 14, 2025Updated 8 months ago
- This repository provides the code for applying Contrastive Learning Penalty Loss (CLPL) and Mixture of Experts (MoE) to the BGE-M3 text e…☆11Dec 27, 2024Updated last year
- Official Implementation of Avoiding spurious correlations via logit correction☆17May 6, 2023Updated 2 years ago
- Exploring Multimodal LLM to generate or enhance wikiHow.☆11May 30, 2024Updated last year
- Meta-Analysis of Robust04 Papers (Yang et al., SIGIR 2019)☆12May 25, 2019Updated 6 years ago
- Panda Guard is designed for researching jailbreak attacks, defenses, and evaluation algorithms for large language models (LLMs).☆66Mar 23, 2026Updated last week
- Third Person Shooter for Unity☆12Jun 26, 2022Updated 3 years ago
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]☆557Apr 4, 2025Updated 11 months ago
- Text generation using language models with multiple exit heads☆16Sep 18, 2025Updated 6 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [CVPR 2023] SGTAPose : Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation from Image Sequence☆19Jan 18, 2024Updated 2 years ago
- ☆12Sep 5, 2022Updated 3 years ago
- 模型压缩的小白入门教程☆22Jul 7, 2024Updated last year
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆152Jul 19, 2024Updated last year
- The official implementation of "Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding"☆22Jun 26, 2025Updated 9 months ago
- A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language Modeling☆15Dec 5, 2023Updated 2 years ago
- ☆25Sep 3, 2025Updated 6 months ago