ttengwang/Caption-Anything

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ttengwang/Caption-Anything)

ttengwang / Caption-Anything

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything

☆1,775

Alternatives and similar repositories for Caption-Anything

Users that are interested in Caption-Anything are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

gaomingqi / Track-Anything
View on GitHub
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI…
☆6,979Dec 13, 2025Updated 7 months ago
showlab / Image2Paragraph
View on GitHub
[Image 2 Text Para] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
☆823Apr 28, 2023Updated 3 years ago
IDEA-Research / Grounded-Segment-Anything
View on GitHub
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and …
☆17,672Sep 5, 2024Updated last year
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,251Jun 2, 2026Updated last month
VainF / Awesome-Anything
View on GitHub
General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, AnyTask, AnyX
☆1,852Nov 15, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
UX-Decoder / Segment-Everything-Everywhere-All-At-Once
View on GitHub
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
☆4,794Aug 19, 2024Updated last year
fudan-zvg / Semantic-Segment-Anything
View on GitHub
Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).
☆2,301Jun 7, 2023Updated 3 years ago
baaivision / Painter
View on GitHub
Painter & SegGPT Series: Vision Foundation Models from BAAI
☆2,593Dec 6, 2024Updated last year
OpenGVLab / Ask-Anything
View on GitHub
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
☆3,344Updated this week
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 6 months ago
haotian-liu / LLaVA
View on GitHub
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆24,928Aug 12, 2024Updated last year
facebookresearch / segment-anything
View on GitHub
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoi…
☆54,561Sep 18, 2024Updated last year
xinyu1205 / recognize-anything
View on GitHub
Open-source and strong foundation image recognition models.
☆3,690Feb 18, 2025Updated last year
X-PLUG / mPLUG-Owl
View on GitHub
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
☆2,535Apr 2, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
mlfoundations / open_flamingo
View on GitHub
An open-source framework for training large multimodal models.
☆4,113Aug 31, 2024Updated last year
microsoft / X-Decoder
View on GitHub
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
☆1,346Oct 5, 2023Updated 2 years ago
UX-Decoder / Semantic-SAM
View on GitHub
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
☆2,850Jul 10, 2025Updated last year
Saiyan-World / grounded-segment-any-parts
View on GitHub
Grounded Segment Anything: From Objects to Parts
☆416May 19, 2023Updated 3 years ago
sail-sg / EditAnything
View on GitHub
Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)
☆3,425Feb 23, 2025Updated last year
IDEA-Research / GroundingDINO
View on GitHub
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
☆10,409Aug 12, 2024Updated last year
EvolvingLMMs-Lab / RelateAnything
View on GitHub
Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.
☆471Jul 4, 2023Updated 3 years ago
facebookresearch / ImageBind
View on GitHub
ImageBind One Embedding Space to Bind Them All
☆9,060Nov 21, 2025Updated 7 months ago
JialianW / GRiT
View on GitHub
GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)
☆341Jan 8, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
mlfoundations / open_clip
View on GitHub
An open source implementation of CLIP.
☆13,999Updated this week
z-x-yang / Segment-and-Track-Anything
View on GitHub
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary alg…
☆3,133Jul 3, 2026Updated 2 weeks ago
OpenGVLab / InternGPT
View on GitHub
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBin…
☆3,204Aug 20, 2024Updated last year
baaivision / EVA
View on GitHub
EVA Series: Visual Representation Fantasies from BAAI
☆2,686Aug 1, 2024Updated last year
shikras / shikra
View on GitHub
☆814Jul 8, 2024Updated 2 years ago
SysCV / sam-hq
View on GitHub
Segment Anything in High Quality [NeurIPS 2023]
☆4,243Sep 12, 2025Updated 10 months ago
CASIA-LMC-Lab / FastSAM
View on GitHub
Fast Segment Anything
☆8,377Jul 30, 2024Updated last year
ZrrSkywalker / Personalize-SAM
View on GitHub
Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
☆1,665Jul 22, 2024Updated last year
facebookresearch / dinov2
View on GitHub
PyTorch code and models for the DINOv2 self-supervised learning method.
☆13,116Jun 3, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
open-mmlab / Multimodal-GPT
View on GitHub
Multimodal-GPT
☆1,512Jun 4, 2023Updated 3 years ago
jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year
IDEA-Research / OpenSeeD
View on GitHub
[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
☆762Jan 22, 2024Updated 2 years ago
salesforce / BLIP
View on GitHub
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
☆5,714Mar 3, 2026Updated 4 months ago
geekyutao / Inpaint-Anything
View on GitHub
Inpaint anything using Segment Anything and inpainting models.
☆7,655Feb 29, 2024Updated 2 years ago
Vision-CAIR / ChatCaptioner
View on GitHub
Official Repository of ChatCaptioner
☆468Apr 13, 2023Updated 3 years ago
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,604Jan 24, 2024Updated 2 years ago