Max-Fu / tvl
☆55Updated 2 months ago
Related projects: ⓘ
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆57Updated 5 months ago
- Binding Touch to Everything: Learning Unified Multimodal Tactile Representations☆21Updated 6 months ago
- Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models☆76Updated 8 months ago
- ☆20Updated 3 months ago
- Code and data release for the paper "Learning Object State Changes in Videos: An Open-World Perspective" (CVPR 2024)☆27Updated last week
- Language Repository for Long Video Understanding☆27Updated 3 months ago
- ☆9Updated 11 months ago
- ☆53Updated 2 months ago
- Multimodal Video Understanding Framework (MVU)☆23Updated 4 months ago
- Theia: Distilling Diverse Vision Foundation Models for Robot Learning☆141Updated 2 weeks ago
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"☆20Updated last year
- [ICLR 2023] SQA3D for embodied scene understanding and reasoning☆115Updated 11 months ago
- Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆44Updated 2 months ago
- Code release for the paper "Egocentric Video Task Translation" (CVPR 2023 Highlight)☆31Updated last year
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆22Updated 3 months ago
- ☆63Updated 3 weeks ago
- ☆60Updated 2 months ago
- Official repository of paper "Subobject-level Image Tokenization"☆58Updated 4 months ago
- ☆35Updated 2 weeks ago
- [ICLR 2024] Seer: Language Instructed Video Prediction with Latent Diffusion Models☆17Updated 3 months ago
- ☆34Updated 4 months ago
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆34Updated 2 months ago
- Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆107Updated 3 weeks ago
- The repo of paper `RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation`☆43Updated 3 months ago
- Egocentric Video Understanding Dataset (EVUD)☆19Updated 2 months ago
- Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]☆85Updated 2 months ago
- ☆72Updated 2 years ago
- Official repo for "iVideoGPT: Interactive VideoGPTs are Scalable World Models", https://arxiv.org/abs/2405.15223☆60Updated 2 weeks ago
- Official implementation of the CVPR'24 paper [Adaptive Slot Attention: Object Discovery with Dynamic Slot Number]☆19Updated 3 weeks ago
- ☆36Updated 5 months ago