kevinliang888 / IVR-QA-baselines
[ICCV 2023] Simple Baselines for Interactive Video Retrieval with Questions and Answers
☆11Updated 5 months ago
Related projects: ⓘ
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆50Updated 3 months ago
- ☆13Updated 2 weeks ago
- https://layer6ai-labs.github.io/xpool/☆111Updated last year
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆31Updated last month
- Official github repo for ICCV2023 paper 'Multi-event Video-Text Retrieval'☆18Updated 7 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆49Updated 2 months ago
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆38Updated 3 months ago
- Benchmark data for "Rethinking Benchmarks for Cross-modal Image-text Retrieval" (SIGIR 2023)☆21Updated last year
- [ECCV2022] A pytorch implementation for TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval☆75Updated last year
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆38Updated 2 months ago
- ☆24Updated 5 months ago
- [CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning☆103Updated 5 months ago
- Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]☆104Updated last year
- Source code of our MM'22 paper Partially Relevant Video Retrieval☆51Updated 2 years ago
- Cross Modal Retrieval with Querybank Normalisation☆52Updated 10 months ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆16Updated 3 weeks ago
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆28Updated 5 months ago
- NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)☆25Updated last year
- [IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment☆44Updated 5 months ago
- [arXiv22] Disentangled Representation Learning for Text-Video Retrieval☆89Updated 2 years ago
- ☆25Updated last year
- [ECCV 22] LocVTP: Video-Text Pre-training for Temporal Localization☆38Updated 2 years ago
- [CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The…☆50Updated 2 months ago
- Official implementation of "Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval (CVPR 2024 Highlight)"☆44Updated last month
- Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)☆46Updated last year
- "Video Moment Retrieval from Text Queries via Single Frame Annotation" in SIGIR 2022.☆63Updated 2 years ago
- A reading list of papers about Visual Grounding.☆31Updated 2 years ago
- Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos☆18Updated 2 months ago
- Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)☆54Updated 7 months ago
- Code for ECCV 2022 paper "Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding"☆29Updated last year