UCSC-VLAA / CLIPALinks

[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"

☆318

Alternatives and similar repositories for CLIPA

Users that are interested in CLIPA are comparing it to the libraries listed below

Sorting:

facebookresearch / DCI
Densely Captioned Images (DCI) dataset repository.
☆191Updated last year
LAION-AI / scaling-laws-openclip
Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)
☆178Updated 4 months ago
google-research / syn-rep-learn
Learning from synthetic data - code and models
☆323Updated last year
bfshi / scaling_on_scales
When do we not need larger vision models?
☆409Updated 8 months ago
LijieFan / LaCLIP
[NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"
☆286Updated last year
facebookresearch / diht
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
☆138Updated 2 years ago
UCSC-VLAA / Recap-DataComp-1B
[ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"
☆142Updated last year
LightDXY / FT-CLIP
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
☆223Updated 2 years ago
baaivision / CapsFusion
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆211Updated last year
Understanding-Visual-Datasets / VisDiff
Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)
☆121Updated last year
allenai / unified-io-inference
☆227Updated last year
YuchenLiu98 / COMM
Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
☆204Updated 9 months ago
ryanwebster90 / snip-dedup
☆103Updated last year
amazon-science / prompt-pretraining
Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"
☆258Updated last year
JialianW / GRiT
GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)
☆336Updated last year
facebookresearch / flip
Official Open Source code for "Scaling Language-Image Pre-training via Masking"
☆428Updated 2 years ago
snap-research / MyVLM
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)
☆179Updated last year
facebookresearch / paco
This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts…
☆286Updated last year
apple / ml-veclip
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
☆246Updated 9 months ago
yukw777 / EILEV
EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties
☆131Updated 11 months ago
kohjingyu / fromage
🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
☆482Updated last year
mu-cai / matryoshka-mm
Matryoshka Multimodal Models
☆112Updated 9 months ago
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
bfshi / TOAST
Official code for "TOAST: Transfer Learning via Attention Steering"
☆186Updated 2 years ago
google-deepmind / perception_test
☆235Updated 4 months ago
baaivision / DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆156Updated 10 months ago
WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆331Updated last year
facebookresearch / meru
Code for the paper "Hyperbolic Image-Text Representations", Desai et al, ICML 2023
☆183Updated 2 years ago
JourneyDB / JourneyDB
☆178Updated 2 years ago
kyegomez / NaViT
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
☆258Updated 2 weeks ago