EvolvingLMMs-Lab / EgoLife
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
β278Updated last month
Alternatives and similar repositories for EgoLife
Users that are interested in EgoLife are comparing it to the libraries listed below
Sorting:
- π‘ VideoMind: A Chain-of-LoRA Agent for Long Video Reasoningβ191Updated 2 weeks ago
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"β181Updated 4 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"β130Updated 5 months ago
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Modelsβ218Updated 7 months ago
- Official repo and evaluation implementation of VSI-Benchβ481Updated 2 months ago
- A Unified Tokenizer for Visual Generation and Understandingβ290Updated last week
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, β¦β114Updated last week
- [ARXIV'25] GameFactory: Creating New Games with Generative Interactive Videosβ288Updated last month
- Matrix-Game: Interactive World Foundation Modelβ164Updated this week
- VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuningβ121Updated last week
- [ICML 2025] Official PyTorch implementation of LongVUβ370Updated last week
- A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D Worldβ252Updated 5 months ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reactionβ105Updated last month
- Awesome Unified Multimodal Modelsβ180Updated last week
- Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with gβ¦β363Updated 3 weeks ago
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β199Updated 5 months ago
- β126Updated 4 months ago
- [ICLR 2025] VideoGrain: This repo is the official implementation of "VideoGrain: Modulating Space-Time Attention for Multi-Grained Video β¦β126Updated last month
- Video-R1: Reinforcing Video Reasoning in MLLMs [π₯the first paper to explore R1 for video]β515Updated this week
- Long Context Transfer from Language to Visionβ374Updated last month
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ315Updated 2 weeks ago
- VideoChat-Flash: Hierarchical Compression for Long-Context Video Modelingβ405Updated this week
- Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasksβ104Updated last week
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Modelβ60Updated 4 months ago
- This is an automatic full segmentation tool based on Segment-Anything-2 and Segment-Anything-1. Our tool performs automatic full segmentaβ¦β170Updated last week
- Generative World Explorerβ143Updated 5 months ago
- DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundnessβ119Updated last month
- An open-source lightweight game generation paradigm. It includes everything from data processing to model architecture design and playabiβ¦β86Updated 4 months ago
- Large Motion Model for Unified Multi-Modal Motion Generationβ270Updated 4 months ago
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokensβ198Updated last year