AI45Lab / IS-BenchLinks
Data and Code for Paper IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
β26Updated 3 weeks ago
Alternatives and similar repositories for IS-Bench
Users that are interested in IS-Bench are comparing it to the libraries listed below
Sorting:
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGIβ136Updated last month
- MAT: Multi-modal Agent Tuning π₯ ICLR 2025 (Spotlight)β55Updated 2 months ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agentsβ172Updated 3 months ago
- β104Updated last month
- ICLR 2025 Agent-Related Papersβ73Updated 9 months ago
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!β48Updated 5 months ago
- β21Updated last month
- [ICML 2025 Oral] Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as embodied agents.β179Updated last month
- β209Updated 2 weeks ago
- Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)β42Updated 4 months ago
- Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"β61Updated last month
- β163Updated 3 months ago
- ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Modelsβ56Updated 3 months ago
- β79Updated last month
- (ACL 2025) π₯π₯π₯Code for "Empowering Multimodal Large Language Models with Evol-Instruct"β17Updated 3 months ago
- repo for paper https://arxiv.org/abs/2504.13837β184Updated 2 months ago
- Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent wβ¦β72Updated 2 weeks ago
- π A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, Agent, and Beyondβ286Updated 2 weeks ago
- A Self-Training Framework for Vision-Language Reasoningβ82Updated 7 months ago
- β46Updated this week
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.β310Updated last month
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!β69Updated 5 months ago
- β27Updated 6 months ago
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.β153Updated 2 weeks ago
- Official Repository of LatentSeekβ60Updated 2 months ago
- Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"β27Updated last month
- Official Repository of "Learning what reinforcement learning can't"β64Updated last week
- Official Repository of "Learning to Reason under Off-Policy Guidance"β285Updated last month
- More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Modelsβ48Updated 3 months ago
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safetyβ48Updated last month