AI45Lab / IS-BenchLinks
Data and Code for Paper IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
☆28Updated last month
Alternatives and similar repositories for IS-Bench
Users that are interested in IS-Bench are comparing it to the libraries listed below
Sorting:
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆184Updated 4 months ago
- MAT: Multi-modal Agent Tuning 🔥 ICLR 2025 (Spotlight)☆62Updated 3 months ago
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆50Updated 2 months ago
- Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)☆48Updated 5 months ago
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆24Updated 3 months ago
- ☆167Updated 4 months ago
- ☆21Updated 2 months ago
- [ICML 2025 Oral] Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as embodied agents.☆186Updated 2 months ago
- ☆107Updated 2 weeks ago
- ☆218Updated last week
- ICLR 2025 Agent-Related Papers☆74Updated 10 months ago
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆167Updated 2 weeks ago
- repo for paper https://arxiv.org/abs/2504.13837☆193Updated 3 months ago
- Codes for paper "SafeAgentBench: A Benchmark for Safe Task Planning of \\ Embodied LLM Agents"☆47Updated 7 months ago
- Official repo of Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics☆36Updated last month
- A Self-Training Framework for Vision-Language Reasoning☆84Updated 8 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆331Updated 2 months ago
- More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models☆56Updated 3 months ago
- [AI4MATH@ICML2025] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆40Updated 4 months ago
- ☆28Updated 7 months ago
- Segment Policy Optimization: Improved Credit Assignment in Reinforcement Learning for LLMs☆34Updated last week
- [arXiv2505] Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains☆50Updated last month
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆51Updated 6 months ago
- [NeurIPS 25] The official implementation of SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning☆22Updated this week
- Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"☆28Updated 2 months ago
- Code for ICLR 2025 Paper "GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment"☆17Updated 7 months ago
- Official Repository of LatentSeek☆60Updated 3 months ago
- ☆69Updated 10 months ago
- Official Repository of "Learning what reinforcement learning can't"☆66Updated 2 weeks ago
- [ICML 2025] Official Implementation of GLIDER☆57Updated 4 months ago