mlfoundations/open_clip

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mlfoundations/open_clip)

mlfoundations / open_clip

An open source implementation of CLIP.

☆14,006

Alternatives and similar repositories for open_clip

Users that are interested in open_clip are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

openai / CLIP
View on GitHub
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
☆34,038Mar 25, 2026Updated 3 months ago
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,253Jun 2, 2026Updated last month
rom1504 / clip-retrieval
View on GitHub
Easily compute clip embeddings and build a clip retrieval system with them
☆2,786Mar 28, 2026Updated 3 months ago
rom1504 / img2dataset
View on GitHub
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
☆4,436Oct 19, 2025Updated 9 months ago
haotian-liu / LLaVA
View on GitHub
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆24,930Aug 12, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
salesforce / BLIP
View on GitHub
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
☆5,716Mar 3, 2026Updated 4 months ago
mlfoundations / wise-ft
View on GitHub
Robust fine-tuning of zero-shot models
☆765Apr 29, 2022Updated 4 years ago
facebookresearch / dinov2
View on GitHub
PyTorch code and models for the DINOv2 self-supervised learning method.
☆13,124Jun 3, 2026Updated last month
huggingface / pytorch-image-models
View on GitHub
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights --…
☆36,993Updated this week
huggingface / diffusers
View on GitHub
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
☆34,119Updated this week
mlfoundations / open_flamingo
View on GitHub
An open-source framework for training large multimodal models.
☆4,114Aug 31, 2024Updated last year
baaivision / EVA
View on GitHub
EVA Series: Visual Representation Fantasies from BAAI
☆2,686Aug 1, 2024Updated last year
microsoft / unilm
View on GitHub
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
☆22,167Jan 23, 2026Updated 5 months ago
IDEA-Research / Grounded-Segment-Anything
View on GitHub
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and …
☆17,676Sep 5, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
facebookresearch / segment-anything
View on GitHub
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoi…
☆54,561Sep 18, 2024Updated last year
facebookresearch / MetaCLIP
View on GitHub
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
☆1,846Nov 27, 2025Updated 7 months ago
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,605Jan 24, 2024Updated 2 years ago
IDEA-Research / GroundingDINO
View on GitHub
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
☆10,415Aug 12, 2024Updated last year
LAION-AI / CLIP_benchmark
View on GitHub
CLIP-like model evaluation
☆814Mar 19, 2026Updated 4 months ago
Dao-AILab / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆24,497Updated this week
CompVis / latent-diffusion
View on GitHub
High-Resolution Image Synthesis with Latent Diffusion Models
☆14,109Feb 29, 2024Updated 2 years ago
BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,952Jul 2, 2026Updated 2 weeks ago
google-research / big_vision
View on GitHub
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
☆3,494May 19, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
lucidrains / vit-pytorch
View on GitHub
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Py…
☆25,423Jun 22, 2026Updated 3 weeks ago
OFA-Sys / Chinese-CLIP
View on GitHub
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
☆5,978Mar 31, 2026Updated 3 months ago
facebookresearch / DiT
View on GitHub
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
☆8,683May 31, 2024Updated 2 years ago
facebookresearch / mae
View on GitHub
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
☆8,366Jul 23, 2024Updated last year
lllyasviel / ControlNet
View on GitHub
Let us control diffusion models!
☆34,011Feb 25, 2024Updated 2 years ago
facebookresearch / xformers
View on GitHub
Hackable and optimized Transformers building blocks, supporting a composable construction.
☆10,524Updated this week
facebookresearch / dino
View on GitHub
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
☆7,605Jul 3, 2024Updated 2 years ago
huggingface / peft
View on GitHub
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
☆21,415Jul 14, 2026Updated last week
OpenGVLab / InternVL
View on GitHub
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
☆10,098Sep 22, 2025Updated 9 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Zasder3 / train-CLIP
View on GitHub
A PyTorch Lightning solution to training OpenAI's CLIP from scratch.
☆720Apr 15, 2022Updated 4 years ago
CompVis / taming-transformers
View on GitHub
Taming Transformers for High-Resolution Image Synthesis
☆6,519Jul 30, 2024Updated last year
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,709Jun 15, 2026Updated last month
QwenLM / Qwen-VL
View on GitHub
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
☆6,710Aug 7, 2024Updated last year
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,630Jan 30, 2026Updated 5 months ago
microsoft / Swin-Transformer
View on GitHub
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
☆16,003Jul 24, 2024Updated last year
KaiyangZhou / CoOp
View on GitHub
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
☆2,218May 20, 2024Updated 2 years ago