salman-lui / x-teamingView external linksLinks
☆55May 21, 2025Updated 8 months ago
Alternatives and similar repositories for x-teaming
Users that are interested in x-teaming are comparing it to the libraries listed below
Sorting:
- Welcome to the official repository for Siren, a project aimed at understanding and mitigating harmful behaviors in large language models …☆15Sep 12, 2025Updated 5 months ago
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆29Jul 9, 2024Updated last year
- ☆121Feb 3, 2025Updated last year
- ☆26Mar 17, 2025Updated 10 months ago
- The official repository for guided jailbreak benchmark☆28Jul 28, 2025Updated 6 months ago
- ☆12Jun 11, 2025Updated 8 months ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 7 months ago
- ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL (ICLR 2025 Pytorch Code)☆17May 15, 2025Updated 9 months ago
- [AAAI 2026] ReCode: Reinforced Code Knowledge Editing for API Updates☆22Jul 1, 2025Updated 7 months ago
- ☆17Jan 5, 2026Updated last month
- ☆18Oct 20, 2024Updated last year
- Official repository of Graph RAG-Tool Fusion and ToolLinkOS dataset.☆22Feb 13, 2025Updated last year
- Panda Guard is designed for researching jailbreak attacks, defenses, and evaluation algorithms for large language models (LLMs).☆61Jan 19, 2026Updated 3 weeks ago
- ☆39May 17, 2025Updated 8 months ago
- Official repository for the paper "Gradient-based Jailbreak Images for Multimodal Fusion Models" (https//arxiv.org/abs/2410.03489)☆19Oct 22, 2024Updated last year
- ☆24May 23, 2025Updated 8 months ago
- [NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"☆22Dec 8, 2024Updated last year
- Röttger et al. (2025): "MSTS: A Multimodal Safety Test Suite for Vision-Language Models"☆16Mar 31, 2025Updated 10 months ago
- Code and data to go with the Zhu et al. paper "An Objective for Nuanced LLM Jailbreaks"☆36Dec 18, 2024Updated last year
- Agent-based implementation of RAG, incorporating AI agents into the RAG pipeline to orchestrate its components and perform additional act…☆19Feb 20, 2025Updated 11 months ago
- [S&P 2026] SoK: Evaluating Jailbreak Guardrails for Large Language Models☆35Dec 17, 2025Updated last month
- Accept by CVPR 2025 (highlight)☆22Jun 8, 2025Updated 8 months ago
- Synthesizing realistic and diverse text-datasets from augmented LLMs☆16Jan 26, 2026Updated 2 weeks ago
- Control LLM☆22Apr 6, 2025Updated 10 months ago
- DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails☆30Feb 26, 2025Updated 11 months ago
- Code for the paper "Jailbreak Large Vision-Language Models Through Multi-Modal Linkage"☆26Dec 6, 2024Updated last year
- Official repository for "On the Multi-modal Vulnerability of Diffusion Models"☆16Jul 15, 2024Updated last year
- The official implementation of "Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding"☆23Jun 26, 2025Updated 7 months ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated 9 months ago
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆18Aug 22, 2024Updated last year
- AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies☆28Aug 14, 2024Updated last year
- Code for "TrustRAG: Enhancing Robustness and Trustworthiness in RAG" AAAI 2026 Workshop on Trust and Control in Agentic AI (TrustAgent)☆52Mar 24, 2025Updated 10 months ago
- [NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding☆21Oct 10, 2024Updated last year
- [AAAI 2026] Multimodal Deepresearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework☆44Jan 25, 2026Updated 3 weeks ago
- ☆47Feb 4, 2026Updated last week
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- A new algorithm that formulates jailbreaking as a reasoning problem.☆26Jul 2, 2025Updated 7 months ago
- ☆29May 22, 2025Updated 8 months ago
- Evaluating the faithfulness of long-context language models☆30Oct 21, 2024Updated last year