AI45Lab / IS-BenchLinks
Data and Code for Paper IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
☆29Updated 2 months ago
Alternatives and similar repositories for IS-Bench
Users that are interested in IS-Bench are comparing it to the libraries listed below
Sorting:
- ICLR 2025 Agent-Related Papers☆73Updated 11 months ago
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆186Updated this week
- MAT: Multi-modal Agent Tuning 🔥 ICLR 2025 (Spotlight)☆65Updated 4 months ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆192Updated 5 months ago
- ☆108Updated last month
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆26Updated 3 months ago
- [ICML 2025 Oral] Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as embodied agents.☆194Updated 3 months ago
- ☆21Updated 2 months ago
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆51Updated 2 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆345Updated 3 months ago
- A Self-Training Framework for Vision-Language Reasoning☆86Updated 8 months ago
- Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent w…☆81Updated last month
- [NeurIPS 2025] More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models☆61Updated 4 months ago
- Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"☆65Updated 2 weeks ago
- Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)☆53Updated 6 months ago
- ☆171Updated 5 months ago
- ☆228Updated this week
- (ACL 2025) 🔥🔥🔥Code for "Empowering Multimodal Large Language Models with Evol-Instruct"☆18Updated 5 months ago
- repo for paper https://arxiv.org/abs/2504.13837☆199Updated 3 months ago
- ☆59Updated last month
- ☆51Updated 3 months ago
- [NeurIPS 2025] Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains☆57Updated 2 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]☆157Updated 4 months ago
- A comprehensive collection of process reward models.☆111Updated 2 weeks ago
- ☆28Updated 8 months ago
- Official repository of RiOSWorld☆41Updated last week
- Official Repository of LatentSeek☆64Updated 4 months ago
- Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning☆88Updated this week
- Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models☆37Updated last month
- Official repository for "CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation"☆34Updated last month