☆14Dec 31, 2024Updated last year
Alternatives and similar repositories for VIPCAP
Users that are interested in VIPCAP are comparing it to the libraries listed below
Sorting:
- [EMNLP 2024] IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning☆15May 13, 2025Updated 9 months ago
- ☆10Jul 5, 2024Updated last year
- [ACL Main 2025] I0T: Embedding Standardization Method Towards Zero Modality Gap☆12Jun 18, 2025Updated 8 months ago
- ☆12May 3, 2024Updated last year
- Cross-modal Active Complementary Learning with Self-refining Correspondence (NeurIPS 2023, Pytorch Code)☆15Jun 6, 2024Updated last year
- Nearest Neighbor Normalization (EMNLP 2024)☆19Nov 1, 2024Updated last year
- SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation☆126Feb 13, 2024Updated 2 years ago
- Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"☆55Mar 28, 2024Updated last year
- ☆22Apr 27, 2024Updated last year
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆61Apr 8, 2024Updated last year
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆32Mar 26, 2025Updated 11 months ago
- Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.☆34Feb 13, 2025Updated last year
- Implementation of our paper, 'Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval.'☆28Dec 3, 2023Updated 2 years ago
- Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"☆33Mar 15, 2024Updated last year
- USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval, TIP 2024☆33Jun 18, 2025Updated 8 months ago
- ☆36Mar 28, 2024Updated last year
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆43Mar 11, 2025Updated 11 months ago
- DL Backtrace is a new explainablity technique for deep learning models that works for any modality and model type.☆23Feb 16, 2026Updated last week
- [COLING'25] HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding☆44Nov 30, 2024Updated last year
- The implementation codes of paper: Multimodal Sentiment Analysis with Mutual Information-based Disentangled Representation Learning☆18May 8, 2025Updated 9 months ago
- [IEEE TIP] Offical implementation for the work "BadCM: Invisible Backdoor Attack against Cross-Modal Learning".☆14Aug 30, 2024Updated last year
- ☆10Apr 7, 2024Updated last year
- Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model☆13Feb 15, 2024Updated 2 years ago
- [CVPR2025] Official code for Lost in Translation Found in Context☆23Jan 14, 2026Updated last month
- Official implementation of Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information☆11Sep 28, 2023Updated 2 years ago
- Improving Continuous Sign Language Recognition with Adapted Image Models☆14Nov 10, 2025Updated 3 months ago
- AlignCLIP: Improving Cross-Modal Alignment in CLIP (ICLR 2025)☆58Mar 1, 2025Updated last year
- [ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data☆46Oct 15, 2023Updated 2 years ago
- The official code and model for ACL 2023 paper 'mCLIP: Multilingual CLIP via Cross-lingual Transfer'☆10Jan 23, 2024Updated 2 years ago
- Official repository for ACM Multimedia'24 paper "MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube a…☆18Aug 11, 2024Updated last year
- [Findings of ACL-2023] This is the official implementation of On the Difference of BERT-style and CLIP-style Text Encoders.☆14Jun 7, 2023Updated 2 years ago
- ☆16Aug 15, 2024Updated last year
- THE ART of MULTIPROCESSOR PROGRAMMING, Maurice Herlihy & Nir Shavit☆10Feb 12, 2023Updated 3 years ago
- This is the repo for "Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition", CVPR2025.☆20Dec 22, 2025Updated 2 months ago
- ☆12Apr 19, 2024Updated last year
- An official pytorch implementation of the paper: [MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval].☆14Jul 27, 2024Updated last year
- ☆11May 17, 2024Updated last year
- ☆11Sep 1, 2024Updated last year
- Crossmodal Translation based Meta Weight Adaption for Robust Image-Text Sentiment Analysis☆15May 16, 2024Updated last year