Zhoues / RoboReferLinks
Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"
☆23Updated this week
Alternatives and similar repositories for RoboRefer
Users that are interested in RoboRefer are comparing it to the libraries listed below
Sorting:
- [World-Model-Survey-2024] Paper list and projects for World Model☆11Updated 7 months ago
- ☆39Updated this week
- RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints☆48Updated last week
- [arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆26Updated this week
- A paper list for spatial reasoning☆82Updated this week
- Official repo of VLABench, a large scale benchmark designed for fairly evaluating VLA, Embodied Agent, and VLMs.☆227Updated last week
- Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as embodied agents. (ICML 2025)☆127Updated this week
- [CVPR2024] This is the official implement of MP5☆102Updated 11 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆128Updated last month
- [NeurIPSw'24] This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simu…☆89Updated 4 months ago
- Latent Motion Token as the Bridging Language for Robot Manipulation☆89Updated 3 weeks ago
- A python script for downloading huggingface datasets and models.☆19Updated last month
- GRAPE: Guided-Reinforced Vision-Language-Action Preference Optimization☆129Updated 2 months ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆35Updated 6 months ago
- Single-file implementation to advance vision-language-action (VLA) models with reinforcement learning.☆96Updated 2 weeks ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆29Updated 6 months ago
- Online RL with Simple Reward Enables Training VLA Models with Only One Trajectory☆157Updated last week
- ☆69Updated 6 months ago
- ☆46Updated 5 months ago
- Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning☆65Updated 3 weeks ago
- Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223☆133Updated 2 weeks ago
- A tiny paper rating web☆37Updated 2 months ago
- ☆99Updated 2 weeks ago
- Official code for the paper: Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld☆57Updated 8 months ago
- ⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.☆157Updated 2 weeks ago
- Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks☆122Updated last week
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs☆54Updated 3 months ago
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆108Updated this week
- Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning☆14Updated last week
- ☆124Updated last year