ttharden / Keyframe-Extraction-for-video-summarization
☆31Updated 8 months ago
Alternatives and similar repositories for Keyframe-Extraction-for-video-summarization
Users that are interested in Keyframe-Extraction-for-video-summarization are comparing it to the libraries listed below
Sorting:
- ☆187Updated 10 months ago
- ☆176Updated 10 months ago
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆49Updated last year
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆139Updated 6 months ago
- Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)☆163Updated 9 months ago
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆131Updated 3 months ago
- Incredibly descriptive audiovisual summaries for videos☆40Updated 9 months ago
- ☆75Updated 2 months ago
- Repository for the NeurIPS 2024 paper "SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up…☆24Updated 5 months ago
- Precision Search through Multi-Style Inputs☆69Updated 3 weeks ago
- Narrative movie understanding benchmark☆70Updated last year
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆67Updated 2 months ago
- ☆73Updated last year
- ☆29Updated 8 months ago
- ☆13Updated 9 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆42Updated 10 months ago
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆26Updated last year
- Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆54Updated last month
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆82Updated 6 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆130Updated 5 months ago
- Video dataset dedicated to portrait-mode video recognition.☆48Updated 5 months ago
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆146Updated 7 months ago
- Official implementation of paper AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding☆57Updated 3 weeks ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆50Updated last year
- The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"☆97Updated 3 weeks ago
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆181Updated 4 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 9 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 6 months ago
- ☆36Updated 8 months ago
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.☆71Updated 7 months ago