farewellthree / PPLLaVA
Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"
☆126Updated 4 months ago
Alternatives and similar repositories for PPLLaVA:
Users that are interested in PPLLaVA are comparing it to the libraries listed below
- ☆176Updated 8 months ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆92Updated this week
- ☆364Updated 3 weeks ago
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆171Updated 2 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆117Updated 4 months ago
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models☆206Updated 6 months ago
- ☆70Updated last week
- 🔥🔥First-ever hour scale video understanding models☆253Updated this week
- [AAAI 2025] StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization☆202Updated this week
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆124Updated last month
- Multimodal Models in Real World☆447Updated 3 weeks ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆221Updated 3 weeks ago
- Long Context Transfer from Language to Vision☆368Updated this week
- Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)☆157Updated 7 months ago
- [IJCV'24] AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort☆149Updated 3 months ago
- Implementation for the paper "ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems".☆146Updated 2 weeks ago
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆138Updated 4 months ago
- ☆72Updated last week
- ☆180Updated 8 months ago
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆133Updated last month
- ☆22Updated 2 months ago
- Code release for our NeurIPS 2024 Spotlight paper "GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing"☆111Updated 4 months ago