shengyin1224 / SafeAgentBenchLinks
Codes for paper "SafeAgentBench: A Benchmark for Safe Task Planning of \\ Embodied LLM Agents"
☆51Updated 7 months ago
Alternatives and similar repositories for SafeAgentBench
Users that are interested in SafeAgentBench are comparing it to the libraries listed below
Sorting:
- Official repo of Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics☆38Updated 2 months ago
- This is the official repository for the ICLR 2025 accepted paper Badrobot: Manipulating Embodied LLMs in the Physical World.☆34Updated 3 months ago
- ☆60Updated 4 months ago
- ☆21Updated 2 months ago
- HAZARD challenge☆36Updated 5 months ago
- ☆49Updated 8 months ago
- Official code for the paper: Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld☆58Updated last year
- Responsible Robotic Manipulation☆12Updated last month
- Focused on the safety and security of Embodied AI☆64Updated 3 months ago
- Official Implementation of FLARE (AAAI'25 Oral)☆22Updated 7 months ago
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆26Updated 3 months ago
- Data and Code for Paper IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks☆29Updated 2 months ago
- [ICML 2025 Oral] Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as embodied agents.☆194Updated 3 months ago
- LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents (ICLR 2024)☆79Updated 4 months ago
- Benchmarking Physical Risk Awareness of Foundation Model-based Embodied AI Agents☆21Updated 10 months ago
- Official code for the paper: WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents☆48Updated 5 months ago
- A toolbox for benchmarking Multimodal LLM Agents trustworthiness across truthfulness, controllability, safety and privacy dimensions thro…☆53Updated 3 months ago
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆51Updated 2 months ago
- Embodied Agent Interface (EAI): Benchmarking LLMs for Embodied Decision Making (NeurIPS D&B 2024 Oral)☆263Updated 7 months ago
- [CVPR2024] This is the official implement of MP5☆104Updated last year
- A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)☆167Updated 3 months ago
- Source codes for the paper "COMBO: Compositional World Models for Embodied Multi-Agent Cooperation"☆42Updated 7 months ago
- [IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simula…☆95Updated 4 months ago
- Evaluate Multimodal LLMs as Embodied Agents☆54Updated 8 months ago
- ☆131Updated last year
- ☆34Updated last year
- ☆38Updated 4 months ago
- [ICLR 2025] Dissecting adversarial robustness of multimodal language model agents☆110Updated 8 months ago
- ☆54Updated last year
- [NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆25Updated this week