jpthu17 / DiffusionRet
[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
☆125Updated 10 months ago
Alternatives and similar repositories for DiffusionRet:
Users that are interested in DiffusionRet are comparing it to the libraries listed below
- [IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment☆50Updated 10 months ago
- [NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations☆130Updated 10 months ago
- [CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning☆112Updated last month
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆62Updated 8 months ago
- 🌀 R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)☆73Updated 7 months ago
- Official Implementation of "The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval"☆77Updated last month
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆48Updated 5 months ago
- MomentDiff: Generative Video Moment Retrieval from Random to Real--NeurIPS 2023☆78Updated last year
- [CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection☆84Updated 6 months ago
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.☆36Updated 4 months ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆47Updated 7 months ago
- An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"☆148Updated 10 months ago
- Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)☆49Updated last year
- Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)☆97Updated 3 weeks ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆67Updated 3 weeks ago
- Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"☆92Updated 6 months ago
- ☆32Updated 4 months ago
- [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval☆38Updated last year
- [Preprint] Number it: Temporal Grounding Videos like Flipping Manga☆54Updated 2 months ago
- Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆87Updated 2 months ago
- ☆32Updated 11 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆37Updated 9 months ago
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆99Updated last year
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆18Updated 2 weeks ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆103Updated last month
- Official Implementation of SnAG (CVPR 2024)☆41Updated 3 months ago
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆28Updated last month
- Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]☆94Updated 7 months ago
- https://layer6ai-labs.github.io/xpool/☆118Updated last year
- [NeurIPS 2022] Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding☆47Updated 11 months ago