InvincibleWyq / ChatVID
Chat about anything on any video!
☆34Updated last year
Related projects ⓘ
Alternatives and complementary repositories for ChatVID
- Accepted by CVPR 2024☆27Updated 5 months ago
- ☆13Updated 8 months ago
- The paper collections for the autoregressive models in vision.☆95Updated this week
- [CVPR 2024] Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training☆35Updated 6 months ago
- The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆68Updated 2 weeks ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆96Updated 6 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆101Updated 3 weeks ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆75Updated 2 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆83Updated 2 weeks ago
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆126Updated 2 months ago
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆123Updated last month
- Official implementation for BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way☆18Updated 3 weeks ago
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …☆96Updated 3 months ago
- The repository contains the official implementation of "DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery", CVPR 2024☆33Updated 5 months ago
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆54Updated 2 months ago
- A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!☆117Updated 10 months ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆81Updated 4 months ago
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆40Updated 4 months ago
- Official implement of MIA-DPO☆32Updated this week
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆224Updated 9 months ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆40Updated last week
- [CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection☆73Updated 3 months ago
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆205Updated this week
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆65Updated 4 months ago
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆145Updated last month
- [CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".☆223Updated 4 months ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆123Updated 3 months ago
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆120Updated 2 weeks ago