salesforce / BLIPLinks

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

☆5,525

Alternatives and similar repositories for BLIP

Users that are interested in BLIP are comparing it to the libraries listed below

Sorting:

salesforce / LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
☆10,944Updated 10 months ago
mlfoundations / open_clip
An open source implementation of CLIP.
☆12,741Updated 3 weeks ago
rom1504 / img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
☆4,187Updated 3 weeks ago
microsoft / GLIP
Grounded Language-Image Pre-training
☆2,516Updated last year
OFA-Sys / OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence L…
☆2,533Updated last year
mlfoundations / open_flamingo
An open-source framework for training large multimodal models.
☆4,021Updated last year
rom1504 / clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
☆2,660Updated 2 months ago
baaivision / EVA
EVA Series: Visual Representation Fantasies from BAAI
☆2,582Updated last year
salesforce / ALBEF
Code for ALBEF: a new vision-language pre-training method
☆1,716Updated 3 years ago
rmokady / CLIP_prefix_caption
Simple image captioning model
☆1,393Updated last year
xinyu1205 / recognize-anything
Open-source and strong foundation image recognition models.
☆3,439Updated 8 months ago
IDEA-Research / GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
☆9,028Updated last year
openai / CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
☆31,013Updated last year
OpenGVLab / LLaMA-Adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
☆5,904Updated last year
KaiyangZhou / CoOp
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
☆2,087Updated last year
gligen / GLIGEN
Open-Set Grounded Text-to-Image Generation
☆2,165Updated last year
facebookresearch / DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
☆7,889Updated last year
facebookresearch / dinov2
PyTorch code and models for the DINOv2 self-supervised learning method.
☆11,690Updated 2 months ago
lucidrains / CoCa-pytorch
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
☆1,181Updated last year
facebookresearch / xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
☆9,989Updated last week
CompVis / latent-diffusion
High-Resolution Image Synthesis with Latent Diffusion Models
☆13,416Updated last year
timojl / clipseg
This repository contains the code of the CVPR 2022 paper "Image Segmentation Using Text and Image Prompts".
☆1,284Updated last year
cloneofsimo / lora
Using Low-rank adaptation to quickly fine-tune diffusion models.
☆7,449Updated last year
google / prompt-to-prompt
☆3,394Updated last year
UX-Decoder / Segment-Everything-Everywhere-All-At-Once
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
☆4,728Updated last year
IDEA-Research / Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and …
☆17,021Updated last year
zai-org / CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
☆6,676Updated last year
google-research / scenic
Scenic: A Jax Library for Computer Vision Research and Beyond
☆3,688Updated last week
baaivision / Painter
Painter & SegGPT Series: Vision Foundation Models from BAAI
☆2,582Updated 10 months ago
haotian-liu / LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆23,763Updated last year