AI45Lab / IS-BenchLinks
[AAAI 2026] Data and Code for Paper IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
☆40Updated 2 months ago
Alternatives and similar repositories for IS-Bench
Users that are interested in IS-Bench are comparing it to the libraries listed below
Sorting:
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆250Updated 3 months ago
- Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)☆67Updated 9 months ago
- MAT: Multi-modal Agent Tuning 🔥 ICLR 2025 (Spotlight)☆84Updated last month
- Training VLM agents with multi-turn reinforcement learning☆391Updated this week
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆53Updated 6 months ago
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆30Updated 7 months ago
- 🔥🔥🔥Latest Papers, Codes on Uncertainty-based RL☆59Updated 5 months ago
- Official repository for "CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation"☆64Updated last month
- ☆21Updated 6 months ago
- Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning☆130Updated last week
- A Self-Training Framework for Vision-Language Reasoning☆88Updated last year
- A Diagnostic Guardrail Framework for AI Agent Safety and Security☆316Updated this week
- [ACL 2025] "World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning." https://arxiv.org/abs/2503.1…☆17Updated 6 months ago
- Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent w…☆99Updated 4 months ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆220Updated 9 months ago
- Official Repository of LatentSeek☆76Updated 8 months ago
- [ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆142Updated last week
- ☆113Updated 4 months ago
- Codes for paper "SafeAgentBench: A Benchmark for Safe Task Planning of \\ Embodied LLM Agents"☆63Updated 11 months ago
- A paper list of Awesome Latent Space.☆319Updated this week
- [NeurIPS 2025] Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains☆74Updated 6 months ago
- [NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning☆96Updated 4 months ago
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆53Updated 10 months ago
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".☆83Updated 6 months ago
- [NeurIPS 25] The official implementation of SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning☆25Updated 4 months ago
- ☆63Updated 6 months ago
- [ICML 2025 Oral] Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as embodied agents.☆262Updated 3 months ago
- [ICML 2025] Official Implementation of GLIDER☆72Updated 3 months ago
- Official codebase for the paper Latent Visual Reasoning☆105Updated 3 months ago
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆104Updated 4 months ago