Embodied-VideoAgent/embodied-videoagent

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Embodied-VideoAgent/embodied-videoagent)

Embodied-VideoAgent / embodied-videoagent

☆49

Alternatives and similar repositories for embodied-videoagent

Users that are interested in embodied-videoagent are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sg-3d / sg3d
View on GitHub
☆55Oct 3, 2024Updated last year
MM-FIRE / FIRE
View on GitHub
☆13Nov 5, 2024Updated last year
MTU3D / MTU3D
View on GitHub
☆266Aug 6, 2025Updated 11 months ago
Pengxiang-Li / MacroClaw
View on GitHub
☆15Mar 16, 2026Updated 4 months ago
AnjieCheng / SR-3D
View on GitHub
[ICLR'26] This repository is the implementation of "3D Aware Region Prompted Vision Language Model"
☆28Feb 19, 2026Updated 5 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Philip-MIT / rover-vlm
View on GitHub
☆18Dec 1, 2025Updated 7 months ago
OpenHelix-Team / VLA-2
View on GitHub
VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation
☆31Nov 3, 2025Updated 8 months ago
clova-tool / CLOVA-tool
View on GitHub
☆30Jun 19, 2024Updated 2 years ago
gbliao / SPC-GS
View on GitHub
[CVPR25] SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs
☆20Aug 27, 2025Updated 10 months ago
UMass-Embodied-AGI / 3D-Mem
View on GitHub
[CVPR 2025] Source codes for the paper "3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning"
☆266Oct 2, 2025Updated 9 months ago
zoezheng126 / Spatio-Temporal-LLM
View on GitHub
☆19Aug 7, 2025Updated 11 months ago
mat-agent / MAT-Agent
View on GitHub
MAT: Multi-modal Agent Tuning 🔥 ICLR 2025 (Spotlight)
☆97Dec 18, 2025Updated 7 months ago
jianglongye / featurenerf
View on GitHub
FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models, ICCV 2023
☆13Jul 13, 2024Updated 2 years ago
zyp123494 / DynaVol
View on GitHub
DynaVol: Unsupervised Learning for Dynamic Scenes through Object-Centric Voxelization (ICLR2024) & DynaVol-S: Dynamic Scene Understanding…
☆21Apr 10, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
SceneCOT / scenecot
View on GitHub
[ICLR 2026] SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes
☆27Mar 22, 2026Updated 3 months ago
alexeybokhovkin / SceneFactor
View on GitHub
We present SceneFactor, a diffusion-based approach for large-scale 3D scene generation that enables controllable generation and effortles…
☆106Apr 29, 2026Updated 2 months ago
beacon-3d / Beacon3D
View on GitHub
[CVPR 2025] Beacon3D: Object-centric Evaluation for 3D Grounding-QA
☆28Nov 25, 2025Updated 7 months ago
CASAGPT / CASA-GPT
View on GitHub
PyTorch implementation of the paper: CASAGPT: Cuboid Arrangement and Scene Assembly for Interior Design [CVPR 2025]
☆15Apr 5, 2025Updated last year
Teacher-Tom / HSGM_public
View on GitHub
[CVPR 2026] Bridging the 2D-3D Gap: A Hierarchical Semantic-Geometric Map for Vision Language Navigation
☆21Jun 11, 2026Updated last month
HaozheZhao / MIC_tool
View on GitHub
☆14Nov 14, 2023Updated 2 years ago
kimren227 / DiffConvex
View on GitHub
☆19Jul 20, 2024Updated 2 years ago
Reagan1311 / Aff-Grasp
View on GitHub
Learning Precise Affordances from Egocentric Videos for Robotic Manipulation (ICCV 2025)
☆26Jan 30, 2026Updated 5 months ago
recuriosity / recuriosity
View on GitHub
Code for the paper "Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration"
☆56May 22, 2026Updated last month
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
neu-vi / struct2d
View on GitHub
Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)
☆31Oct 28, 2025Updated 8 months ago
XingruiWang / DynSuperCLEVR
View on GitHub
A video question answering dataset that focuses on the dynamics properties of objects (velocity, acceleration) and their collisions withi…
☆20Apr 23, 2025Updated last year
DennisRotondi / FunGraph
View on GitHub
[IROS'25] Official implementation of the paper FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction
☆17Oct 11, 2025Updated 9 months ago
Chenyu-Wang567 / All-Angles-Bench
View on GitHub
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs
☆69Mar 22, 2026Updated 3 months ago
sharinka0715 / FlowDreamer
View on GitHub
[RA-L 2026] Official implemetation of the paper "FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipu…
☆19Jan 19, 2026Updated 6 months ago
HybridRobotics / MomaGraph
View on GitHub
[ICLR 2026 Oral] MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Models for Embodied Task Planning
☆56Mar 4, 2026Updated 4 months ago
meta-scenes / MetaScenes
View on GitHub
☆68Dec 3, 2025Updated 7 months ago
nuomizai / T2VLM
View on GitHub
[ICCV'25] T2 -VLM: Training-Free Generation of Temporally Consistent Rewards from VLMs
☆16Jul 8, 2025Updated last year
Evm7 / ego4dlta-icvae
View on GitHub
[WACV2023] Intention-Conditioned Long-Term Human Egocentric Action Forecasting @ EGO4D Challenge 2022
☆14Sep 3, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ItsBaymax / Meta-Memory
View on GitHub
Meta-Memory: Retrieving and Integrating Semantic-Spatial Memories for Robot Spatial Reasoning
☆16Nov 26, 2025Updated 7 months ago
ylwhxht / MSGNav
View on GitHub
CVPR 2026 - MSGNav: Unleashing the Power of Multi-modal 3D Scene Graph for Zero-Shot Embodied Navigation
☆64Mar 23, 2026Updated 3 months ago
YueFan1014 / VideoAgent
View on GitHub
This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)
☆320Dec 5, 2024Updated last year
AIGeeksGroup / Nav-R1
View on GitHub
Nav-R1: Reasoning and Navigation in Embodied Scenes
☆128Oct 31, 2025Updated 8 months ago
hany01rye / tiger
View on GitHub
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
☆23Nov 18, 2025Updated 8 months ago
shiyao-li / MAGICIAN
View on GitHub
[CVPR 2026 (Oral)] MAGICIAN: Efficient Long-Term Planning with Imagined Gaussians for Active Mapping
☆156May 27, 2026Updated last month
SunnyYWD / AC-2-VLA
View on GitHub
☆16Jan 27, 2026Updated 5 months ago