salesforce / BLIPLinks
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
☆5,508Updated last year
Alternatives and similar repositories for BLIP
Users that are interested in BLIP are comparing it to the libraries listed below
Sorting:
- LAVIS - A One-stop Library for Language-Vision Intelligence☆10,936Updated 10 months ago
- Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence L…☆2,538Updated last year
- An open source implementation of CLIP.☆12,676Updated 2 weeks ago
- Easily compute clip embeddings and build a clip retrieval system with them☆2,653Updated last month
- Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.☆4,180Updated 2 weeks ago
- An open-source framework for training large multimodal models.☆4,017Updated last year
- Grounded Language-Image Pre-training☆2,507Updated last year
- EVA Series: Visual Representation Fantasies from BAAI☆2,578Updated last year
- Code for ALBEF: a new vision-language pre-training method☆1,713Updated 3 years ago
- Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch☆1,178Updated last year
- Simple image captioning model☆1,394Updated last year
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.☆3,161Updated 4 months ago
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,668Updated last year
- CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image☆30,921Updated last year
- Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)☆2,079Updated last year
- Scenic: A Jax Library for Computer Vision Research and Beyond☆3,679Updated this week
- [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"☆8,969Updated last year
- PyTorch code and models for the DINOv2 self-supervised learning method.☆11,638Updated last month
- Open-source and strong foundation image recognition models.☆3,431Updated 7 months ago
- mPLUG-Owl: The Powerful Multi-modal Large Language Model Family☆2,519Updated 6 months ago
- ☆3,392Updated last year
- This repository contains the code of the CVPR 2022 paper "Image Segmentation Using Text and Image Prompts".☆1,284Updated last year
- Open-Set Grounded Text-to-Image Generation☆2,164Updated last year
- Taming Transformers for High-Resolution Image Synthesis☆6,324Updated last year
- Painter & SegGPT Series: Vision Foundation Models from BAAI☆2,582Updated 10 months ago
- Using Low-rank adaptation to quickly fine-tune diffusion models.☆7,443Updated last year
- Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).☆1,217Updated last year
- Image to prompt with BLIP and CLIP☆2,904Updated last year
- [NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"☆4,725Updated last year
- [EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding☆3,078Updated last year