EvolvingLMMs-Lab / EgoLife
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
☆244Updated last week
Alternatives and similar repositories for EgoLife:
Users that are interested in EgoLife are comparing it to the libraries listed below
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆173Updated 3 months ago
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models☆208Updated 6 months ago
- [ARXIV'25] GameFactory: Creating New Games with Generative Interactive Videos☆275Updated last week
- Official repo and evaluation implementation of VSI-Bench☆423Updated last month
- A Unified Tokenizer for Visual Generation and Understanding☆216Updated 3 weeks ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆126Updated 4 months ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆93Updated last week
- Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.☆53Updated last week
- A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World☆229Updated 4 months ago
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)☆183Updated 3 months ago
- Long Context Transfer from Language to Vision☆368Updated last week
- ☆366Updated last month
- [Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos☆305Updated 6 months ago
- [ICLR 2025] VideoGrain: This repo is the official implementation of "VideoGrain: Modulating Space-Time Attention for Multi-Grained Video …☆110Updated this week
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first work to systematically explore R1 for video]☆205Updated this week
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"☆179Updated last week
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆81Updated this week
- ☆122Updated 2 months ago
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens☆195Updated last year
- Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…☆338Updated this week
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"☆154Updated 3 months ago
- Official Implementation of Video-T1: Test-Time Scaling for Video Generation☆70Updated this week
- [CVPR 2025] A Hierarchical Movie Level Dataset for Long Video Generation☆52Updated 2 weeks ago
- This is an automatic full segmentation tool based on Segment-Anything-2 and Segment-Anything-1. Our tool performs automatic full segmenta…☆159Updated 5 months ago
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆98Updated 4 months ago
- [CVPR2025] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆172Updated last week
- Accepted by CVPR 2024☆32Updated 10 months ago
- ☆62Updated last week
- Generative World Explorer☆138Updated 4 months ago
- An open-source lightweight game generation paradigm. It includes everything from data processing to model architecture design and playabi…☆81Updated 2 months ago