hrtang22 / MUSE
Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval"
☆11Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for MUSE
- ☆27Updated last year
- [ACM MM 22] Correspondence Matters for Video Referring Expression Comprehension☆14Updated 2 years ago
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆29Updated last month
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆40Updated 4 months ago
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.☆29Updated 3 weeks ago
- ☆22Updated last year
- Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning☆20Updated 10 months ago
- ☆33Updated last year
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆30Updated last week
- LLMBind: A Unified Modality-Task Integration Framework☆15Updated 4 months ago
- A reading list of papers about Visual Grounding.☆31Updated 2 years ago
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆61Updated 5 months ago
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆17Updated last month
- [CVPR' 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding☆40Updated 3 months ago
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆44Updated 4 months ago
- Source code of our CVPR2024 paper TeachCLIP for Text-to-Video Retrieval☆17Updated last week
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆78Updated 7 months ago
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆39Updated last year
- FreeVA: Offline MLLM as Training-Free Video Assistant☆48Updated 5 months ago
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"☆61Updated 7 months ago
- ☆12Updated last year
- ☆32Updated 11 months ago
- [CVPR 2023] Pytorch Code of MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering☆16Updated last year
- [CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning☆109Updated 7 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆55Updated 2 weeks ago
- An official implementation for MS-DETR in ACL'23☆16Updated last year
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆64Updated 3 weeks ago
- Task Residual for Tuning Vision-Language Models (CVPR 2023)☆65Updated last year
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆50Updated last month