adxcreative / EERCFLinks
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
☆20Updated 8 months ago
Alternatives and similar repositories for EERCF
Users that are interested in EERCF are comparing it to the libraries listed below
Sorting:
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆43Updated last year
 - Research Code for Multimodal-Cognition Team in Ant Group☆169Updated 2 weeks ago
 - Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated 2 years ago
 - Lion: Kindling Vision Intelligence within Large Language Models☆51Updated last year
 - [ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption☆101Updated 2 years ago
 - Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆94Updated 9 months ago
 - ☆30Updated 2 years ago
 - ☆118Updated last year
 - The official implementation of RAR☆92Updated last year
 - [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆152Updated 2 months ago
 - ☆80Updated 11 months ago
 - [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆105Updated last year
 - UniMD: Towards Unifying Moment retrieval and temporal action Detection☆54Updated last year
 - Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆32Updated 7 months ago
 - Visual Instruction Tuning for Qwen2 Base Model☆39Updated last year
 - Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…☆43Updated 11 months ago
 - [ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives☆39Updated last month
 - ☆21Updated 10 months ago
 - [CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant☆171Updated 3 months ago
 - Implementation of PyramidCLIP(NeurIPS2022).☆31Updated 2 years ago
 - [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.☆129Updated 2 months ago
 - [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval☆38Updated 2 years ago
 - Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆107Updated last year
 - ☆93Updated 2 months ago
 - LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆70Updated 5 months ago
 - ☆52Updated 7 months ago
 - Official implementation of TagAlign☆35Updated 10 months ago
 - ☆87Updated last year
 - [CVPR 2024] TeachCLIP for Text-to-Video Retrieval☆40Updated 5 months ago
 - ☆33Updated 3 months ago