hlchen23 / ADPN-MM
Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding"
☆47Updated last year
Alternatives and similar repositories for ADPN-MM:
Users that are interested in ADPN-MM are comparing it to the libraries listed below
- ☆174Updated 7 months ago
- ☆66Updated 2 months ago
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.☆69Updated 4 months ago
- This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"☆159Updated last month
- ☆172Updated 7 months ago
- Video dataset dedicated to portrait-mode video recognition.☆44Updated 2 months ago
- 🔥🔥First-ever hour scale video understanding models☆231Updated last month
- This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"☆113Updated 2 weeks ago
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆24Updated last year
- [CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection☆84Updated 6 months ago
- 🌀 R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)☆73Updated 7 months ago
- ☆38Updated last month
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 4 months ago
- Code for CVPR 2022 paper "Scene Consistency Representation Learning for Video Scene Segmentation"☆91Updated 2 years ago
- ☆134Updated last month
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆117Updated 2 weeks ago
- [CVPR2024] MotionEditor is the first diffusion-based model capable of video motion editing.☆154Updated 7 months ago
- [ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"☆138Updated 5 months ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆89Updated 7 months ago
- Precision Search through Multi-Style Inputs☆62Updated 6 months ago
- ☆61Updated last week
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆137Updated 3 months ago
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆87Updated 2 months ago
- ☆78Updated 9 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆136Updated 7 months ago
- [NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT☆136Updated 9 months ago
- LMM which strictly superset LLM embedded☆37Updated 3 months ago
- ☆110Updated 11 months ago
- Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models☆81Updated 2 months ago