shengyin1224 / SafeAgentBench
Codes for paper "SafeAgentBench: A Benchmark for Safe Task Planning of \\ Embodied LLM Agents"
☆21Updated this week
Alternatives and similar repositories for SafeAgentBench:
Users that are interested in SafeAgentBench are comparing it to the libraries listed below
- Official code for the paper: Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld☆52Updated 4 months ago
- Official repo of VLABench, a large scale benchmark designed for fairly evaluating VLA, Embodied Agent, and VLMs.☆144Updated this week
- ☆40Updated 3 weeks ago
- Embodied Agent Interface (EAI): Benchmarking LLMs for Embodied Decision Making (NeurIPS D&B 2024 Oral)☆174Updated last month
- ☆124Updated 7 months ago
- HAZARD challenge☆28Updated 9 months ago
- LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents (ICLR 2024)☆66Updated 6 months ago
- Official Implementation of ReALFRED (ECCV'24)☆37Updated 4 months ago
- Accepted by ECCV 2024☆105Updated 4 months ago
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"☆75Updated last year
- The reinforcement learning codes for dataset SPA-VL☆31Updated 8 months ago
- [CVPR2024] This is the official implement of MP5☆96Updated 8 months ago
- [NeurIPSw'24] This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simu…☆82Updated last month
- ICLR 2025 Agent-Related Papers☆46Updated 3 months ago
- Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆30Updated last month
- [arXiv 2023] Embodied Task Planning with Large Language Models☆170Updated last year
- [ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…☆40Updated 8 months ago
- A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)☆131Updated last week
- [ICLR 2025] Dissecting Adversarial Robustness of Multimodal LM Agents☆68Updated last week
- A most Frontend Collection and survey of vision-language model papers, and models GitHub repository☆64Updated last week
- A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)☆156Updated last month
- ☆66Updated 2 months ago
- ☆29Updated 5 months ago
- ProgPrompt for Virtualhome☆126Updated last year
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…☆270Updated 3 months ago
- The official codebase for ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation(cvpr 2024)☆112Updated 7 months ago
- ☆30Updated 3 months ago
- ☆29Updated 4 months ago
- up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources☆95Updated last week
- ☆41Updated 2 months ago