Awesome Vision-Language Pretraining Papers
☆41Jan 15, 2025Updated last year
Alternatives and similar repositories for awesome-vision-language-pretraining
Users that are interested in awesome-vision-language-pretraining are comparing it to the libraries listed below
Sorting:
- ☆11Sep 15, 2023Updated 2 years ago
- X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization, CVPR 2024☆11Nov 7, 2024Updated last year
- [NeurIPS 2024] Official implementation of "Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance"☆17Dec 4, 2024Updated last year
- Retrieval-augmented Image Captioning☆13Feb 16, 2023Updated 3 years ago
- LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections (NeurIPS 2023)☆29Dec 27, 2023Updated 2 years ago
- Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text☆24Aug 15, 2022Updated 3 years ago
- ☆49Feb 18, 2025Updated last year
- [NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training☆27Dec 5, 2023Updated 2 years ago
- SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models☆21Jan 11, 2024Updated 2 years ago
- ☆24Oct 9, 2023Updated 2 years ago
- The code of the paper "Negative Pre-aware for Noisy Cross-modal Matching" in AAAI 2024.☆30Jul 2, 2025Updated 8 months ago
- Some papers about *diverse* image (a few videos) captioning☆26Apr 4, 2023Updated 2 years ago
- ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs☆28Aug 15, 2025Updated 6 months ago
- Vecna is a Python chatbot which recommends songs and movies depending upon your feelings☆12Jun 28, 2022Updated 3 years ago
- [CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training☆38Mar 27, 2025Updated 11 months ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆37Aug 18, 2024Updated last year
- ☆57Jan 7, 2023Updated 3 years ago
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆27Nov 29, 2023Updated 2 years ago
- ☆34Mar 7, 2024Updated 2 years ago
- [AAAI2024] Code Release of CLIM: Contrastive Language-Image Mosaic for Region Representation☆30Feb 4, 2024Updated 2 years ago
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆35Jun 30, 2025Updated 8 months ago
- ☆30Aug 14, 2023Updated 2 years ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆31May 16, 2024Updated last year
- [CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?☆35Apr 27, 2023Updated 2 years ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆138May 8, 2025Updated 10 months ago
- ☆75Dec 12, 2025Updated 2 months ago
- We archive data because we are interested in the diffs. All data is from https://video-api.cartoonnetwork.com. We run the check every min…☆10Updated this week
- ☆10Jul 29, 2022Updated 3 years ago
- AAPL: Adding Attributes to Prompt Learning for Vision-Language Models (CVPRw 2024)☆34May 8, 2024Updated last year
- Extract information from XBRL files in the ESEF format☆13Jan 3, 2026Updated 2 months ago
- Belief Revision based Caption Re-ranker with Visual Semantic Information. COLING 2022☆11Apr 13, 2025Updated 10 months ago
- This project predicts wind turbine failure using numerous sensor data by applying classification based ML models that improves prediction…☆11Mar 20, 2023Updated 2 years ago
- Source code of the paper titled *Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding*☆30Apr 16, 2021Updated 4 years ago
- ☆40Apr 8, 2024Updated last year
- [ICCV 2023] Bayesian Prompt Learning for Image-Language Model Generalization☆39Oct 6, 2023Updated 2 years ago
- ☆82Nov 6, 2023Updated 2 years ago
- Deep Cross-Modal Projection Learning for Image-Text Matching☆77Sep 2, 2020Updated 5 years ago
- [ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…☆82Feb 22, 2025Updated last year