AI45Lab / X-BoundaryLinks
The code repo of paper "X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability"
☆30Updated 4 months ago
Alternatives and similar repositories for X-Boundary
Users that are interested in X-Boundary are comparing it to the libraries listed below
Sorting:
- Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"☆57Updated 4 months ago
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆48Updated 2 months ago
- ☆134Updated 4 months ago
- AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)☆282Updated last week
- A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)☆154Updated 3 weeks ago
- Accepted by ECCV 2024☆142Updated 9 months ago
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…☆60Updated last year
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …☆65Updated this week
- Accepted by IJCAI-24 Survey Track☆207Updated 10 months ago
- The reinforcement learning codes for dataset SPA-VL☆36Updated last year
- Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning …☆60Updated last month
- ☆93Updated 5 months ago
- The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…☆57Updated 6 months ago
- A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…☆105Updated 3 weeks ago
- ☆29Updated last month
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond☆268Updated last week
- ☆50Updated last year
- Latest Advances on Long Chain-of-Thought Reasoning☆432Updated last week
- Code for ICLR 2025 Paper "GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment"☆14Updated 5 months ago
- ☆52Updated 3 months ago
- ☆242Updated last week
- 【ACL 2024】 SALAD benchmark & MD-Judge☆154Updated 4 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆142Updated 2 months ago
- ☆102Updated last week
- 😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.☆334Updated last week
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆84Updated last week
- [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…☆68Updated 2 months ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆145Updated 2 months ago
- ☆21Updated 4 months ago
- ☆35Updated 3 weeks ago