Skyline-9 / Visionary-Vids
Multi-modal transformer approach for natural language query based joint video summarization and highlight detection
☆13Updated 7 months ago
Alternatives and similar repositories for Visionary-Vids:
Users that are interested in Visionary-Vids are comparing it to the libraries listed below
- [ICCV 2023] Accurate and Fast Compressed Video Captioning☆36Updated 11 months ago
- The code repo for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning"☆18Updated last year
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆41Updated 3 weeks ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 5 months ago
- [2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line☆27Updated last year
- MUSIC-AVQA, CVPR2022 (ORAL)☆72Updated 2 years ago
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆99Updated 11 months ago
- Code for paper, "TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency" ECCV 2022☆37Updated last year
- Source code of our MM'22 paper Partially Relevant Video Retrieval☆52Updated 2 months ago
- Towards Long Form Audio-visual Video Understanding☆11Updated 2 months ago
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆24Updated last year
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆71Updated 11 months ago
- Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos☆22Updated 6 months ago
- Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)☆20Updated 5 months ago
- Code for CVPR2023 paper "Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies"☆17Updated last year
- Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports☆32Updated last year
- [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval☆38Updated last year
- ☆54Updated 2 years ago
- (TIP'2023) Concept-Aware Video Captioning: Describing Videos with Effective Prior Information☆25Updated 3 weeks ago
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆48Updated 4 months ago
- Code for CVPR 2023 paper "SViTT: Temporal Learning of Sparse Video-Text Transformers"☆18Updated last year
- The official codebase of FineAction dataset. We will update the data and code of our FineAction.☆17Updated 2 years ago
- Official implementation of TagAlign☆34Updated last month
- ☆22Updated 3 months ago
- Narrative movie understanding benchmark☆63Updated 8 months ago
- Official pytorch repository for "TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection" (AAAI 2024 Pape…☆38Updated 3 weeks ago
- ☆28Updated last year
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆16Updated 11 months ago
- Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"☆28Updated 3 months ago
- Unified Audio-Visual Perception for Multi-Task Video Localization☆24Updated 9 months ago