InvincibleWyq / ChatVID
Chat about anything on any video!
☆35Updated last year
Alternatives and similar repositories for ChatVID:
Users that are interested in ChatVID are comparing it to the libraries listed below
- Diffusion Powers Video Tokenizer for Comprehension and Generation☆64Updated 2 months ago
- Accepted by CVPR 2024☆31Updated 9 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆115Updated last month
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆45Updated 4 months ago
- ☆21Updated 9 months ago
- The repository contains the official implementation of "Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation"☆30Updated 2 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆73Updated 3 weeks ago
- FQGAN: Factorized Visual Tokenization and Generation☆42Updated last month
- 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆249Updated last month
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆90Updated 3 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆81Updated 5 months ago
- ☆47Updated 2 months ago
- ☆14Updated 11 months ago
- [CVPR 2024] Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training☆37Updated 10 months ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆109Updated 9 months ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆95Updated 7 months ago
- The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆84Updated 3 months ago
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆138Updated 5 months ago
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …☆96Updated 2 months ago
- Official code for MotionBench☆24Updated last month
- ☆47Updated 2 weeks ago
- Liquid: Language Models are Scalable Multi-modal Generators☆63Updated 2 months ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆51Updated this week
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆49Updated 2 weeks ago
- A collection of vision foundation models unifying understanding and generation.☆40Updated last month
- Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).☆49Updated this week
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆37Updated 3 months ago
- Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics☆77Updated last week
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆68Updated this week