showlab/Image2Paragraph

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/showlab/Image2Paragraph)

showlab / Image2Paragraph

[Image 2 Text Para] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.

☆822

Alternatives and similar repositories for Image2Paragraph

Users that are interested in Image2Paragraph are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JialianW / GRiT
View on GitHub
GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)
☆341Jan 8, 2024Updated 2 years ago
ttengwang / Caption-Anything
View on GitHub
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with dive…
☆1,775Aug 29, 2023Updated 2 years ago
showlab / VLog
View on GitHub
[CVPR 2025] Video Narration as Vocabulary & Video as Long Document
☆587Mar 13, 2025Updated last year
Vision-CAIR / ChatCaptioner
View on GitHub
Official Repository of ChatCaptioner
☆468Apr 13, 2023Updated 3 years ago
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,257Jun 2, 2026Updated last month
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
showlab / cosmo
View on GitHub
☆75May 10, 2024Updated 2 years ago
EvolvingLMMs-Lab / RelateAnything
View on GitHub
Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.
☆471Jul 4, 2023Updated 3 years ago
VainF / Awesome-Anything
View on GitHub
General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, AnyTask, AnyX
☆1,852Nov 15, 2023Updated 2 years ago
showlab / ShowAnything
View on GitHub
☆83Aug 1, 2023Updated 2 years ago
showlab / all-in-one
View on GitHub
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
☆281Mar 25, 2023Updated 3 years ago
microsoft / X-Decoder
View on GitHub
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
☆1,346Oct 5, 2023Updated 2 years ago
UX-Decoder / Segment-Everything-Everywhere-All-At-Once
View on GitHub
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
☆4,794Aug 19, 2024Updated last year
sail-sg / EditAnything
View on GitHub
Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)
☆3,425Feb 23, 2025Updated last year
fudan-zvg / Semantic-Segment-Anything
View on GitHub
Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).
☆2,301Jun 7, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
IDEA-Research / Grounded-Segment-Anything
View on GitHub
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and …
☆17,688Sep 5, 2024Updated last year
OpenGVLab / Ask-Anything
View on GitHub
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
☆3,344Jul 17, 2026Updated last week
allenai / mmc4
View on GitHub
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
☆953Mar 19, 2025Updated last year
open-mmlab / Multimodal-GPT
View on GitHub
Multimodal-GPT
☆1,512Jun 4, 2023Updated 3 years ago
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,605Jan 24, 2024Updated 2 years ago
opendatalab / CLIP-Parrot-Bias
View on GitHub
ECCV2024_Parrot Captions Teach CLIP to Spot Text
☆66Sep 6, 2024Updated last year
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 6 months ago
mlfoundations / open_flamingo
View on GitHub
An open-source framework for training large multimodal models.
☆4,118Aug 31, 2024Updated last year
baaivision / Painter
View on GitHub
Painter & SegGPT Series: Vision Foundation Models from BAAI
☆2,593Dec 6, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
OpenGVLab / LLaMA-Adapter
View on GitHub
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
☆5,915Mar 14, 2024Updated 2 years ago
sail-sg / ptp
View on GitHub
[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》
☆150Jun 7, 2023Updated 3 years ago
X-PLUG / mPLUG-Owl
View on GitHub
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
☆2,537Apr 2, 2025Updated last year
OFA-Sys / OFA
View on GitHub
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence L…
☆2,557Apr 24, 2024Updated 2 years ago
haotian-liu / LLaVA
View on GitHub
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆24,952Aug 12, 2024Updated last year
xinyu1205 / recognize-anything
View on GitHub
Open-source and strong foundation image recognition models.
☆3,696Feb 18, 2025Updated last year
facebookresearch / LaViLa
View on GitHub
Code release for "Learning Video Representations from Large Language Models"
☆534Oct 1, 2023Updated 2 years ago
mlfoundations / datacomp
View on GitHub
DataComp: In search of the next generation of multimodal datasets
☆787Apr 28, 2025Updated last year
TencentARC / T2I-Adapter
View on GitHub
T2I-Adapter
☆3,803Jun 21, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
weijiawu / ParaDiffusion
View on GitHub
[IJCV 2025] Paragraph-to-Image Generation with Information-Enriched Diffusion Model
☆107Mar 24, 2025Updated last year
baaivision / EVA
View on GitHub
EVA Series: Visual Representation Fantasies from BAAI
☆2,685Aug 1, 2024Updated last year
AILab-CVC / SEED
View on GitHub
Official implementation of SEED-LLaMA (ICLR 2024).
☆642Sep 21, 2024Updated last year
shikras / shikra
View on GitHub
☆814Jul 8, 2024Updated 2 years ago
rom1504 / img2dataset
View on GitHub
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
☆4,438Oct 19, 2025Updated 9 months ago
gligen / GLIGEN
View on GitHub
Open-Set Grounded Text-to-Image Generation
☆2,226Mar 6, 2024Updated 2 years ago
mlfoundations / open_clip
View on GitHub
An open source implementation of CLIP.
☆14,032Jul 17, 2026Updated last week