EvolvingLMMs-Lab / EgoLifeLinks
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
β378Updated 10 months ago
Alternatives and similar repositories for EgoLife
Users that are interested in EgoLife are comparing it to the libraries listed below
Sorting:
- Official repo and evaluation implementation of VSI-Benchβ667Updated 5 months ago
- π‘ VideoMind: A Chain-of-LoRA Agent for Long Video Reasoningβ299Updated 3 months ago
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β289Updated last year
- π This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.β391Updated last week
- Cambrian-S: Towards Spatial Supersensing in Videoβ482Updated last month
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, β¦β201Updated 8 months ago
- [NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuningβ255Updated 3 months ago
- This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"β266Updated 3 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoningβ137Updated 5 months ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reactionβ163Updated 10 months ago
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Modelsβ287Updated last year
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoningβ103Updated 6 months ago
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"β308Updated last year
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligenceβ424Updated last week
- A paper list for spatial reasoningβ614Updated last week
- A Large-scale Video Action Datasetβ341Updated 2 weeks ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Modelsβ167Updated 3 months ago
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]β267Updated 2 months ago
- Official code for MotionBench (CVPR 2025)β63Updated 10 months ago
- [ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D Worldβ369Updated 3 months ago
- [ICML 2025] Official PyTorch implementation of LongVUβ419Updated 8 months ago
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Modelβ81Updated 2 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligenceβ74Updated last week
- TStar is a unified temporal search framework for long-form video question answeringβ86Updated 4 months ago
- Structured Video Comprehension of Real-World Shortsβ230Updated 4 months ago
- Video-R1: Reinforcing Video Reasoning in MLLMs [π₯the first paper to explore R1 for video]β806Updated last month
- A list of works on video generation towards world modelβ330Updated 2 weeks ago
- β118Updated 2 months ago
- Code for the Molmo2 Vision-Language Modelβ139Updated last month
- Evaluating Knowledge Acquisition from Multi-Discipline Professional Videosβ63Updated 4 months ago