[NeurIPS 2023] The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" accepted by NeurIPS' 2023.
☆27May 14, 2024Updated last year
Alternatives and similar repositories for PAU
Users that are interested in PAU are comparing it to the libraries listed below
Sorting:
- A simple pytorch implementation of baseline based-on CLIP for Image-text Matching.☆19May 25, 2023Updated 2 years ago
- Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval -- AAAI2025☆17Jul 14, 2025Updated 7 months ago
- [NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training☆27Dec 5, 2023Updated 2 years ago
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆66Jun 7, 2024Updated last year
- The code of "Image-text Retrieval via Preserving Main Semantic of Vision" in ICME 2023.☆15Dec 25, 2023Updated 2 years ago
- Implementation of "DIME-FM: DIstilling Multimodal and Efficient Foundation Models"☆15Oct 12, 2023Updated 2 years ago
- ☆81Nov 6, 2023Updated 2 years ago
- ☆30Aug 14, 2023Updated 2 years ago
- [CVPR 2024] TeachCLIP for Text-to-Video Retrieval☆42May 7, 2025Updated 9 months ago
- Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining, WACV 2024☆14Jan 3, 2024Updated 2 years ago
- [ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"☆84Jul 4, 2024Updated last year
- SIEVE: Multimodal Dataset Pruning using Image-Captioning Models (CVPR 2024)☆18Apr 28, 2024Updated last year
- ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models☆45Dec 21, 2023Updated 2 years ago
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆37Oct 18, 2023Updated 2 years ago
- Official github repo for ICCV2023 paper 'Multi-event Video-Text Retrieval'☆19Feb 16, 2024Updated 2 years ago
- [AAAI 2024] GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval☆20May 10, 2024Updated last year
- This is a summary of research on noisy correspondence. There may be omissions. If anything is missing please get in touch with us. Our em…☆77Nov 7, 2025Updated 3 months ago
- The code of the paper of "A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval" accepted b…☆19Jan 16, 2024Updated 2 years ago
- [ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.☆19Jun 7, 2024Updated last year
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆61Apr 8, 2024Updated last year
- offical implementation of "Calibrating Multimodal Learning" on ICML 2023☆20Jun 5, 2023Updated 2 years ago
- Visual Delta Generator with Large Multi-modal Model for Semi-supervised Composed Image Retrieval - CVPR2024☆21May 30, 2024Updated last year
- [CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding☆55Apr 7, 2025Updated 10 months ago
- [IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment☆53Apr 9, 2024Updated last year
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)☆55Sep 7, 2023Updated 2 years ago
- Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".☆118Oct 9, 2025Updated 4 months ago
- The code of the paper "Negative Pre-aware for Noisy Cross-modal Matching" in AAAI 2024.☆30Jul 2, 2025Updated 8 months ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- Composed Video Retrieval☆62May 2, 2024Updated last year
- Counterfactual Reasoning VQA Dataset☆28Nov 23, 2023Updated 2 years ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆37Aug 18, 2024Updated last year
- Benchmark data for "Rethinking Benchmarks for Cross-modal Image-text Retrieval" (SIGIR 2023)☆27Apr 24, 2023Updated 2 years ago
- An Enhanced CLIP Framework for Learning with Synthetic Captions☆40Apr 18, 2025Updated 10 months ago
- Official repository of "Chatting Makes Perfect: Chat-based Image Retrieval"☆31Feb 5, 2025Updated last year
- Weakly Supervised Video Moment Localisation with Contrastive Negative Sample Mining☆30Apr 4, 2022Updated 3 years ago
- [ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model☆140Apr 9, 2024Updated last year
- LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections (NeurIPS 2023)☆29Dec 27, 2023Updated 2 years ago
- The official implementation for BLIP4CIR with bi-directional training | Bi-directional Training for Composed Image Retrieval via Text Pro…☆34Feb 7, 2024Updated 2 years ago