EvolvingLMMs-Lab / EgoLife
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
β262Updated last month
Alternatives and similar repositories for EgoLife:
Users that are interested in EgoLife are comparing it to the libraries listed below
- π‘ VideoMind: A Chain-of-LoRA Agent for Long Video Reasoningβ172Updated this week
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"β178Updated 3 months ago
- A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D Worldβ249Updated 4 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"β127Updated 5 months ago
- A Unified Tokenizer for Visual Generation and Understandingβ256Updated last week
- Official repo and evaluation implementation of VSI-Benchβ463Updated last month
- Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"β207Updated last year
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Modelsβ215Updated 7 months ago
- Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasksβ74Updated last week
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β193Updated 4 months ago
- Long Context Transfer from Language to Visionβ372Updated last month
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, β¦β104Updated 3 weeks ago
- [ARXIV'25] GameFactory: Creating New Games with Generative Interactive Videosβ281Updated last month
- [ICLR 2025] VideoGrain: This repo is the official implementation of "VideoGrain: Modulating Space-Time Attention for Multi-Grained Video β¦β118Updated 3 weeks ago
- β368Updated last month
- VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuningβ105Updated last week
- (ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Lifeβ343Updated 4 months ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reactionβ99Updated 3 weeks ago
- β126Updated 3 months ago
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generationβ102Updated 5 months ago
- [CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Languβ¦β283Updated 9 months ago
- [Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videosβ310Updated 7 months ago
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"β170Updated 4 months ago
- Video-R1: Reinforcing Video Reasoning in MLLMs [π₯the first paper to explore R1 for video]β446Updated last week
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokensβ197Updated last year
- An open-source lightweight game generation paradigm. It includes everything from data processing to model architecture design and playabiβ¦β83Updated 3 months ago
- β158Updated last month
- π₯π₯First-ever hour scale video understanding modelsβ286Updated this week
- β183Updated 9 months ago
- Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.β61Updated last week