SCZwangxiao / RTQ-MM2023
ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model
☆15Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for RTQ-MM2023
- ☆27Updated last year
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.☆30Updated last month
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆39Updated last year
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆41Updated 4 months ago
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆29Updated last month
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆33Updated 2 weeks ago
- [ECCV 22] LocVTP: Video-Text Pre-training for Temporal Localization☆38Updated 2 years ago
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"☆61Updated 7 months ago
- ☆54Updated 4 months ago
- A reading list of papers about Visual Grounding.☆31Updated 2 years ago
- [ICCV 2023] Accurate and Fast Compressed Video Captioning☆34Updated 9 months ago
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated last year
- [ACM MM 22] Correspondence Matters for Video Referring Expression Comprehension☆14Updated 2 years ago
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆98Updated 9 months ago
- ☆22Updated last year
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆69Updated 9 months ago
- Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆39Updated last year
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆51Updated 2 months ago
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆45Updated 5 months ago
- [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval☆38Updated last year
- Turning to Video for Transcript Sorting☆46Updated last year
- [2023 ACL] CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding☆28Updated last year
- ☆30Updated 2 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆48Updated 5 months ago
- ☆36Updated 7 months ago
- ☆47Updated 2 years ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆27Updated 2 months ago
- (CVPR 2023) Official implemention of the paper "Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos…☆27Updated 7 months ago
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆29Updated 7 months ago
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆61Updated 5 months ago