Zhoues / RoboReferLinks

[NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"

☆205

Alternatives and similar repositories for RoboRefer

Users that are interested in RoboRefer are comparing it to the libraries listed below

Sorting:

Zhangwenyao1 / DreamVLA
[NeurIPS 2025] DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge
☆245Updated 2 months ago
InternRobotics / InternVLA-M1
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
☆296Updated 3 weeks ago
pickxiguapi / Embodied-R1
Official code for "Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation"
☆108Updated 3 months ago
qizekun / SoFar
[NeurIPS 2025 Spotlight] SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
☆208Updated 5 months ago
PKU-HMI-Lab / Hybrid-VLA
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
☆324Updated 2 months ago
baaivision / UniVLA
Unified Vision-Language-Action Model
☆245Updated last month
OpenHelix-Team / VLA-RFT
VLA-RFT: Vision-Language-Action Models with Reinforcement Fine-Tuning
☆96Updated 2 months ago
EmbodiedCity / Embodied-R.code
☆86Updated 6 months ago
CladernyJorn / UP-VLA
Official PyTorch implementation for ICML 2025 paper: UP-VLA.
☆51Updated 5 months ago
OpenHelix-Team / LLaVA-VLA
LLaVA-VLA: A Simple Yet Powerful Vision-Language-Action Model [Actively Maintained🔥]
☆173Updated last month
RoboDita / Dita
ICCV2025
☆143Updated 3 weeks ago
declare-lab / Emma-X
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning
☆78Updated 6 months ago
PKU-HMI-Lab / LIFT3D
[CVPR 2025]Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
☆170Updated 5 months ago
Fanqi-Lin / OneTwoVLA
Official implementation of "OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning"
☆200Updated 6 months ago
xiaoxiao0406 / VQ-VLA
The offical repo for paper "VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers" (ICCV 2025)
☆101Updated 3 weeks ago
InternRobotics / F1-VLA
F1: A Vision Language Action Model Bridging Understanding and Generation to Actions
☆137Updated last month
InternRobotics / InstructVLA
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
☆70Updated 2 months ago
liufanfanlff / RoboUniview
☆61Updated 9 months ago
MCG-NJU / Tra-MoE
[CVPR 2025] Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
☆51Updated 8 months ago
BAAI-DCAI / SpatialBot
The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
☆319Updated 2 months ago
InternRobotics / OST-Bench
[NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
☆68Updated 2 months ago
TencentARC / Moto
[ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
☆150Updated 2 months ago
hume-vla / hume
🦾 A Dual-System VLA with System2 Thinking
☆122Updated 3 months ago
sg-3d / sg3d
☆52Updated last year
InternRobotics / VLM-Grounder
[CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
☆122Updated 6 months ago
OpenDriveLab / CLOVER
[NeurIPS 2024] CLOVER: Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation
☆130Updated 3 months ago
AIGeeksGroup / Nav-R1
Nav-R1: Reasoning and Navigation in Embodied Scenes
☆75Updated last month
JiuTian-VL / Large-VLM-based-VLA-for-Robotic-Manipulation
A curated list of large VLM-based VLA models for robotic manipulation.
☆280Updated 2 weeks ago
ShuangLI59 / unified_video_action
Official PyTorch Implementation of Unified Video Action Model (RSS 2025)
☆303Updated 4 months ago
NVlabs / RoboSpatial
☆122Updated 2 months ago