JeffWang987 / EgoVid
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
☆102Updated 5 months ago
Alternatives and similar repositories for EgoVid:
Users that are interested in EgoVid are comparing it to the libraries listed below
- A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.☆62Updated this week
- Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation☆90Updated this week
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆80Updated last month
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities☆71Updated 6 months ago
- Code&Data for Grounded 3D-LLM with Referent Tokens☆109Updated 3 months ago
- [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning☆37Updated 4 months ago
- [NeurIPS 2024] Official code repository for MSR3D paper☆50Updated last month
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆69Updated last month
- Unifying 2D and 3D Vision-Language Understanding☆69Updated last week
- ☆126Updated 3 months ago
- Single-file implementation to advance vision-language-action (VLA) models with reinforcement learning.☆50Updated last week
- Official implementation of the paper "Unifying 3D Vision-Language Understanding via Promptable Queries"☆73Updated 8 months ago
- Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".☆18Updated 2 weeks ago
- [ECCV2024, Oral, Best Paper Finalist]This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation …☆37Updated last month
- ☆22Updated 2 weeks ago
- [NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding☆86Updated 2 months ago
- ☆158Updated last month
- Code for paper "Grounding Video Models to Actions through Goal Conditioned Exploration".☆44Updated 3 months ago
- [CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video☆64Updated this week
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (arXiv 2025)☆28Updated last month
- Generative World Explorer☆141Updated 4 months ago
- DELTA: Dense Efficient Long-range 3D Tracking for Any video (ICLR 2025)☆87Updated 2 weeks ago
- [ICLR 2023] SQA3D for embodied scene understanding and reasoning☆131Updated last year
- Official Implementation of paper "Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence"☆120Updated 2 weeks ago
- IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos☆41Updated 2 weeks ago
- The official implementation of The paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs"☆51Updated 2 weeks ago
- ☆49Updated 6 months ago
- ☆46Updated 4 months ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆60Updated 6 months ago
- Latent Motion Token as the Bridging Language for Robot Manipulation☆81Updated 3 weeks ago