☆14Dec 31, 2024Updated last year
Alternatives and similar repositories for VIPCAP
Users that are interested in VIPCAP are comparing it to the libraries listed below
Sorting:
- [EMNLP 2024] IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning☆15May 13, 2025Updated 10 months ago
- ☆10Jul 5, 2024Updated last year
- SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation☆126Feb 13, 2024Updated 2 years ago
- ☆12May 3, 2024Updated last year
- Nearest Neighbor Normalization (EMNLP 2024)☆20Nov 1, 2024Updated last year
- Cross-modal Active Complementary Learning with Self-refining Correspondence (NeurIPS 2023, Pytorch Code)☆15Jun 6, 2024Updated last year
- [ACL Main 2025] I0T: Embedding Standardization Method Towards Zero Modality Gap☆12Jun 18, 2025Updated 9 months ago
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆62Apr 8, 2024Updated last year
- Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"☆55Mar 28, 2024Updated last year
- Implementation of our paper, 'Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval.'☆28Dec 3, 2023Updated 2 years ago
- Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.☆39Feb 13, 2025Updated last year
- ☆18Feb 20, 2025Updated last year
- THE ART of MULTIPROCESSOR PROGRAMMING, Maurice Herlihy & Nir Shavit☆10Feb 12, 2023Updated 3 years ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆32Mar 26, 2025Updated 11 months ago
- ☆22Apr 27, 2024Updated last year
- USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval, TIP 2024☆33Jun 18, 2025Updated 9 months ago
- ☆36Mar 28, 2024Updated last year
- Project to provide driver guidance through object recognition in the vehicle driving environment: Display bounding boxes on objects in im…☆20Aug 25, 2024Updated last year
- [ICLR 2025] TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval☆26Feb 13, 2025Updated last year
- Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"☆33Mar 15, 2024Updated 2 years ago
- An official pytorch implementation of the paper: [MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval].☆14Jul 27, 2024Updated last year
- The official code and model for ACL 2023 paper 'mCLIP: Multilingual CLIP via Cross-lingual Transfer'☆10Jan 23, 2024Updated 2 years ago
- ☆10Apr 7, 2024Updated last year
- ☆30Jul 21, 2025Updated 8 months ago
- [CVPR2025] Official code repository for SeTa: "Scale Efficient Training for Large Datasets"☆23Mar 18, 2025Updated last year
- ☆14Oct 14, 2019Updated 6 years ago
- Benchmarking Retrieval-Augmented Generation in Multi-Turn Legal Consultation Conversation☆38Mar 3, 2025Updated last year
- ☆10Feb 13, 2025Updated last year
- ☆13Jul 13, 2024Updated last year
- Automatic Generation of Scaffolding Questions for Learning Math, EMNLP 2022. RL, REINFORCE☆25Jun 30, 2023Updated 2 years ago
- ☆12Apr 19, 2024Updated last year
- Code repository for "Post-pre-training for Modality Alignment in Vision-Language Foundation Models" (CVPR2025)☆38Jul 25, 2025Updated 7 months ago
- ☆17Mar 5, 2024Updated 2 years ago
- [2024] INSANet: INtra-INter Spectral Attention Network for Effective Feature Fusion of Multispectral Pedestrian Detection, Sensors.☆23Mar 20, 2024Updated 2 years ago
- [NeurIPS '24] Frustratingly easy Test-Time Adaptation of VLMs!!☆61Mar 24, 2025Updated 11 months ago
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆57Aug 16, 2024Updated last year
- SotA text-only image/video method (IJCAI 2023)☆15Jan 9, 2024Updated 2 years ago
- [ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data☆45Oct 15, 2023Updated 2 years ago
- ☆28Jul 9, 2025Updated 8 months ago