AI45Lab / IS-BenchLinks
Data and Code for Paper IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
☆31Updated this week
Alternatives and similar repositories for IS-Bench
Users that are interested in IS-Bench are comparing it to the libraries listed below
Sorting:
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆53Updated 4 months ago
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆30Updated 5 months ago
- MAT: Multi-modal Agent Tuning 🔥 ICLR 2025 (Spotlight)☆73Updated 5 months ago
- ICLR 2025 Agent-Related Papers☆72Updated last year
- [NeurIPS 2025] Official repository of RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use Agents☆47Updated 2 weeks ago
- Codes for paper "SafeAgentBench: A Benchmark for Safe Task Planning of \\ Embodied LLM Agents"☆57Updated 9 months ago
- ☆21Updated 4 months ago
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆205Updated last month
- Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"☆71Updated last month
- A toolbox for benchmarking Multimodal LLM Agents trustworthiness across truthfulness, controllability, safety and privacy dimensions thro…☆57Updated 5 months ago
- Training VLM agents with multi-turn reinforcement learning☆324Updated 3 weeks ago
- Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)☆59Updated 7 months ago
- Official repo of Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics☆50Updated 3 months ago
- [AAAI'26 Oral] Official Implementation of STAR-1: Safer Alignment of Reasoning LLMs with 1K Data☆32Updated 7 months ago
- [ICML 2025] Official Implementation of GLIDER☆67Updated last month
- ☆184Updated 6 months ago
- ☆109Updated 2 months ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆203Updated 6 months ago
- 🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.☆28Updated this week
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…☆68Updated last year
- Official Repository of "Learning what reinforcement learning can't"☆69Updated last week
- A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…☆164Updated last month
- [ICML 2025 Oral] Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as embodied agents.☆218Updated last month
- The reinforcement learning codes for dataset SPA-VL☆42Updated last year
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆48Updated 5 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆387Updated 4 months ago
- Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning☆109Updated last month