jacobmarks / awesome-clip-papers
The most impactful papers related to contrastive pretraining for multimodal models!
☆38Updated 6 months ago
Related projects: ⓘ
- [CVPR24] Official Implementation of GEM (Grounding Everything Module)☆72Updated 9 months ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆92Updated last week
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆96Updated 4 months ago
- Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆107Updated last month
- Open source implementation of "Vision Transformers Need Registers"☆126Updated last week
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆97Updated 5 months ago
- Object Recognition as Next Token Prediction (CVPR 2024)☆153Updated 2 months ago
- Holds code for our CVPR'23 tutorial: All Things ViTs: Understanding and Interpreting Attention in Vision.☆166Updated last year
- Learning from synthetic data - code and models☆293Updated 8 months ago
- ☆55Updated 3 months ago
- Code and models for the paper "The effectiveness of MAE pre-pretraining for billion-scale pretraining" https://arxiv.org/abs/2303.13496☆75Updated last month
- PyTorch implementation of R-MAE https//arxiv.org/abs/2306.05411☆106Updated last year
- Official repository of paper "Subobject-level Image Tokenization"☆58Updated 4 months ago
- ☆100Updated last month
- When do we not need larger vision models?☆314Updated last month
- ☆189Updated 10 months ago
- Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"☆251Updated 4 months ago
- Code for the paper "Hyperbolic Image-Text Representations", Desai et al, ICML 2023☆127Updated last year
- [NeurIPS 2022] The official implementation of "Learning to Discover and Detect Objects".☆107Updated last year
- Code release for "Improved baselines for vision-language pre-training"☆54Updated 4 months ago
- Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.☆84Updated 5 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆22Updated 3 months ago
- ☆93Updated 3 months ago
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆181Updated 8 months ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆77Updated last week
- ☆163Updated last year
- [CVPR 2023] Learning Visual Representations via Language-Guided Sampling☆142Updated last year
- Code implementation of our NeurIPS 2023 paper: Vocabulary-free Image Classification☆101Updated 7 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆275Updated 2 months ago
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆149Updated 9 months ago