farewellthree / PPLLaVA
Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"
☆72Updated this week
Related projects ⓘ
Alternatives and complementary repositories for PPLLaVA
- ☆259Updated last week
- Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)☆142Updated 3 months ago
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆98Updated last month
- ☆165Updated 4 months ago
- 🔥🔥First-ever hour scale video understanding models☆156Updated 2 weeks ago
- A Training-free Iterative Framework for Long Story Visualization☆59Updated last month
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆126Updated 9 months ago
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆128Updated 3 months ago
- Code release for our NeurIPS 2024 Spotlight paper "GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing"☆79Updated 3 weeks ago
- Multimodal Models in Real World☆400Updated 2 weeks ago
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆132Updated 2 weeks ago
- Live2Diff: A Pipeline that processes Live video streams by a uni-directional video Diffusion model.☆166Updated 3 months ago
- VCoder: Versatile Vision Encoders for Multimodal Large Language Models, arXiv 2023 / CVPR 2024☆261Updated 6 months ago
- Video-Infinity generates long videos quickly using multiple GPUs without extra training.☆163Updated 3 months ago
- [ECCV 2024] Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models☆56Updated 2 weeks ago
- ☆116Updated 2 months ago
- Long Context Transfer from Language to Vision☆328Updated 2 weeks ago
- The codes of Siggraph Asia 2024 paper "Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation"☆33Updated 2 months ago
- [Arxiv 2024] Official pytorch implementation of "VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion…☆147Updated 7 months ago
- ☆164Updated 4 months ago
- An open source community implementation of the model from the paper: "Movie Gen: A Cast of Media Foundation Models". Join our community …☆53Updated this week
- ☆145Updated 3 weeks ago
- Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆113Updated last month
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models☆166Updated last month
- Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding☆213Updated 3 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆62Updated 3 weeks ago
- ☆145Updated 2 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆137Updated this week
- Let's finetune video generation models!☆186Updated this week
- LLaVA-HR: High-Resolution Large Language-Vision Assistant☆213Updated 3 months ago