farewellthree / PPLLaVA
Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"
☆126Updated 3 months ago
Alternatives and similar repositories for PPLLaVA:
Users that are interested in PPLLaVA are comparing it to the libraries listed below
- ☆176Updated 8 months ago
- ☆360Updated last week
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆169Updated 2 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆117Updated 4 months ago
- Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)☆155Updated 7 months ago
- [IJCV'24] AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort☆148Updated 3 months ago
- ☆69Updated this week
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆122Updated last month
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆89Updated this week
- ☆21Updated 2 months ago
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models☆206Updated 5 months ago
- Multimodal Models in Real World☆442Updated 2 weeks ago
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆138Updated 4 months ago
- [AAAI 2025] StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization☆201Updated 3 weeks ago
- 🔥🔥First-ever hour scale video understanding models☆247Updated last week
- Implementation for the paper "ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems".☆145Updated last week
- An open source community implementation of the model from the paper: "Movie Gen: A Cast of Media Foundation Models". Join our community …☆59Updated this week
- ☆177Updated 8 months ago
- ☆69Updated this week
- This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"☆152Updated 2 weeks ago
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆129Updated last year
- Code release for our NeurIPS 2024 Spotlight paper "GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing"☆110Updated 4 months ago
- Official implementation of MagicFace: Training-free Universal-Style Human Image Customized Synthesis.☆61Updated 2 months ago
- Long Context Transfer from Language to Vision☆367Updated 3 months ago