farewellthree / PPLLaVALinks

Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"

☆130

Alternatives and similar repositories for PPLLaVA

Users that are interested in PPLLaVA are comparing it to the libraries listed below

Sorting:

yeliudev / VideoMind
💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
☆284Updated last month
Vision-CAIR / LongVU
[ICML 2025] Official PyTorch implementation of LongVU
☆412Updated 6 months ago
thunlp / Migician
[ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
☆81Updated 6 months ago
bytedance / vidi
The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"
☆393Updated this week
bytedance / Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
☆159Updated 10 months ago
Deaddawn / DreamFrame-code
☆184Updated 4 months ago
sterzhang / image-textualization
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)
☆169Updated last year
hyc2026 / StoryTeller
☆80Updated 8 months ago
HumanMLLM / HumanOmniV2
☆141Updated 4 months ago
cnzzx / VSA
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
☆128Updated last year
gpt4video / GPT4Video
Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation
☆144Updated last year
showlab / MovieAgent
MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning
☆263Updated 8 months ago
showlab / livecc
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
☆312Updated last month
IVGSZ / Flash-VStream
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆251Updated last month
Vchitect / Evaluation-Agent
[ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible
☆109Updated 3 months ago
apple / ml-slowfast-llava
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
☆285Updated last year
bytedance / Valley
Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.
☆263Updated this week
Mark12Ding / Dispider
[CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
☆146Updated 8 months ago
marinero4972 / Open-o3-Video
Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆119Updated 3 weeks ago
SHI-Labs / VCoder
[CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models
☆280Updated last year
md-mohaiminul / VideoRecap
☆201Updated last year
knightyxp / VideoGrain
[ICLR 2025] VideoGrain: This repo is the official implementation of "VideoGrain: Modulating Space-Time Attention for Multi-Grained Video …
☆157Updated 8 months ago
hlchen23 / ADPN-MM
Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…
☆52Updated last year
zhenyuw16 / GenArtist
Code release for our NeurIPS 2024 Spotlight paper "GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing"
☆156Updated last year
EvolvingLMMs-Lab / EgoLife
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
☆351Updated 8 months ago
Leon1207 / Video-RAG-master
✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensi…
☆351Updated last month
aim-uofa / AutoStory
[IJCV'24] AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort
☆151Updated last year
EvolvingLMMs-Lab / LongVA
Long Context Transfer from Language to Vision
☆398Updated 8 months ago
bytedance / tarsier
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆502Updated 3 months ago
MetabrainAGI / Awaker2.5-VL
☆35Updated 10 months ago