zli12321 / VLM-surveys

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository

☆64

Alternatives and similar repositories for VLM-surveys:

Users that are interested in VLM-surveys are comparing it to the libraries listed below

leofan90 / Awesome-World-Models
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and A…
☆75Updated this week
WayneMao / RoboMatrix
The Official Implementation of RoboMatrix
☆84Updated last month
OpenMOSS / VLABench
Official repo of VLABench, a large scale benchmark designed for fairly evaluating VLA, Embodied Agent, and VLMs.
☆144Updated this week
zhangyuejoslin / VLN-Survey-with-Foundation-Models
[TMLR 2024] repository for VLN with foundation models
☆50Updated last month
GengzeZhou / NavGPT-2
[ECCV 2024] Official implementation of NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
☆119Updated 5 months ago
GengzeZhou / NavGPT
[AAAI 2024] Official implementation of NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
☆195Updated last year
Ram81 / goat-bench
☆75Updated 7 months ago
snumprlab / realfred
Official Implementation of ReALFRED (ECCV'24)
☆37Updated 4 months ago
HCPLab-SYSU / LH-VLN
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method (CVPR-25)
☆22Updated this week
JeremyLinky / YouTube-VLN
[ICCV'23] Learning Vision-and-Language Navigation from YouTube Videos
☆50Updated 2 months ago
Stanford-ILIAD / explore-eqa
Public release for "Explore until Confident: Efficient Exploration for Embodied Question Answering"
☆43Updated 7 months ago
AnjieCheng / SpatialRGPT
[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
☆124Updated 2 months ago
HaochenZ11 / VLA-3D
☆49Updated last month
UMass-Embodied-AGI / MultiPLY
Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
☆126Updated 4 months ago
OpenDriveLab / CLOVER
[NeurIPS 2024] CLOVER: Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation
☆98Updated 2 months ago
aiming-lab / GRAPE
GRAPE: Guided-Reinforced Vision-Language-Action Preference Optimization
☆86Updated last month
OpenRobotLab / VLM-Grounder
[CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
☆85Updated 3 months ago
wzcai99 / Pixel-Navigator
Official GitHub Repository for Paper "Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill", …
☆85Updated 4 months ago
liufanfanlff / RoboUniview
☆50Updated last week
embodied-agent-interface / embodied-agent-interface
Embodied Agent Interface (EAI): Benchmarking LLMs for Embodied Decision Making (NeurIPS D&B 2024 Oral)
☆174Updated last month
LaVi-Lab / NaviLLM
[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'
☆33Updated 10 months ago
changhaonan / A3VLM
[CoRL2024] Official repo of `A3VLM: Actionable Articulation-Aware Vision Language Model`
☆104Updated 4 months ago
zd11024 / NaviLLM
[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'
☆160Updated 8 months ago
raphael-sch / VELMA
VELMA agent for VLN in Street View
☆16Updated last year
lmzpai / roboMamba
The repo of paper `RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation`
☆87Updated 2 months ago
allenai / PoliFormer
PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators
☆65Updated 3 months ago
yueyang130 / DeeR-VLA
Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"
☆72Updated 2 weeks ago
jzhzhang / NaVid-VLN-CE
[RSS 2024] NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation
☆83Updated 3 weeks ago