EvolvingLMMs-Lab / EgoLifeLinks
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
☆345Updated 8 months ago
Alternatives and similar repositories for EgoLife
Users that are interested in EgoLife are comparing it to the libraries listed below
Sorting:
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆127Updated 3 months ago
- Official repo and evaluation implementation of VSI-Bench☆631Updated 3 months ago
- 📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.☆332Updated last week
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)☆272Updated 11 months ago
- 💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning☆277Updated last month
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆145Updated 8 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆194Updated 6 months ago
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆386Updated 5 months ago
- This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"☆247Updated last month
- A paper list for spatial reasoning☆222Updated this week
- Cambrian-S: Towards Spatial Supersensing in Video☆375Updated last week
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model☆76Updated 10 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆97Updated 4 months ago
- [NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning☆227Updated last month
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆158Updated last month
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models☆283Updated last year
- Structured Video Comprehension of Real-World Shorts☆216Updated 2 months ago
- Official code for MotionBench (CVPR 2025)☆59Updated 8 months ago
- TStar is a unified temporal search framework for long-form video question answering☆71Updated 2 months ago
- [ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World☆348Updated last month
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"☆281Updated 11 months ago
- [ICML 2025] Official PyTorch implementation of LongVU☆412Updated 6 months ago
- ☆100Updated 3 weeks ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆191Updated 3 months ago
- Long Context Transfer from Language to Vision☆397Updated 8 months ago
- The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"☆148Updated 2 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆130Updated last year
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆250Updated 2 weeks ago
- ☆104Updated 4 months ago
- Accepted by CVPR 2024☆39Updated last year