xk-huang/segment-caption-anything

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xk-huang/segment-caption-anything)

xk-huang / segment-caption-anything

[CVPR'24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloading the trained model checkpoints, and example notebooks / gradio demo that show how to use the model.

☆232

Alternatives and similar repositories for segment-caption-anything

Users that are interested in segment-caption-anything are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AndyTang15 / FLAG3Dv2
View on GitHub
☆25May 9, 2024Updated 2 years ago
AndyTang15 / FLAG3D
View on GitHub
☆19Jun 22, 2026Updated 3 weeks ago
facebookresearch / DCI
View on GitHub
Densely Captioned Images (DCI) dataset repository.
☆197Jul 1, 2024Updated 2 years ago
shiyi-zh0408 / NAE_CVPR2024
View on GitHub
[CVPR 2024] Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
☆43May 16, 2024Updated 2 years ago
EternalEvan / DPMesh
View on GitHub
The repository contains the official implementation of "DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery", CVPR 2024
☆45Jun 4, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Yxxxb / LAVT-RS
View on GitHub
[CVPR'2022, TPAMI'2024] LAVT: Language-Aware Vision Transformer for Referring Segmentation
☆26Jan 21, 2025Updated last year
Tengbo-Yu / AnyBimanual
View on GitHub
[ICCV2025] AnyBimanual: Transfering Unimanual Policy for General Bimanual Manipulation
☆103Jun 26, 2025Updated last year
SuleBai / SC-CLIP
View on GitHub
[TIP 2025] Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
☆72Mar 27, 2026Updated 3 months ago
zhang9302002 / ThinkingWithVideos
View on GitHub
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆101Oct 15, 2025Updated 9 months ago
yongliu20 / SCAN
View on GitHub
[CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"
☆77Sep 23, 2024Updated last year
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆963Aug 5, 2025Updated 11 months ago
shiyi-zh0408 / Meta-CoT
View on GitHub
[CVPR 2026] Official code of the paper "Meta-CoT: Enhancing Granularity and Generalization in Image Editing"
☆78May 6, 2026Updated 2 months ago
yongliu20 / Awesome-Unified-Understanding-and-Generation
View on GitHub
☆52Aug 22, 2025Updated 10 months ago
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆507Aug 9, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
shiyi-zh0408 / LOGO
View on GitHub
[CVPR 2023] LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
☆48Apr 9, 2024Updated 2 years ago
AMAP-ML / UniVG-R1
View on GitHub
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
☆165Jun 2, 2025Updated last year
UX-Decoder / Semantic-SAM
View on GitHub
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
☆2,852Jul 10, 2025Updated last year
Yxxxb / VoCo-LLaMA
View on GitHub
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆205Jun 18, 2025Updated last year
ttengwang / Caption-Anything
View on GitHub
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with dive…
☆1,775Aug 29, 2023Updated 2 years ago
jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year
ucas-vg / Sambor
View on GitHub
Sambor: Boosting Segment Anything Model Towards Open-Vocabulary Learning
☆32Dec 7, 2023Updated 2 years ago
GuanxingLu / ManiGaussian
View on GitHub
[ECCV 2024] ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
☆282Mar 29, 2026Updated 3 months ago
lizhou-cs / mglmm
View on GitHub
☆32Jun 14, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
baaivision / tokenize-anything
View on GitHub
[ECCV 2024] Tokenize Anything via Prompting
☆601Dec 11, 2024Updated last year
SunzeY / AlphaCLIP
View on GitHub
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
☆876Jul 20, 2025Updated last year
fanq15 / Stable-SAM
View on GitHub
☆73Dec 6, 2023Updated 2 years ago
facebookresearch / paco
View on GitHub
This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts…
☆300Feb 12, 2024Updated 2 years ago
JialianW / GRiT
View on GitHub
GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)
☆341Jan 8, 2024Updated 2 years ago
Shengcao-Cao / groundLMM
View on GitHub
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆47Oct 19, 2025Updated 9 months ago
berkeley-hipie / segllm
View on GitHub
Code release for "SegLLM: Multi-round Reasoning Segmentation"
☆129Feb 20, 2025Updated last year
baaivision / CapsFusion
View on GitHub
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆215Feb 27, 2024Updated 2 years ago
Dai-Wenxun / MotionLCM
View on GitHub
[ ECCV 2024 ] MotionLCM: This repo is the official implementation of "MotionLCM: Real-time Controllable Motion Generation via Latent Cons…
☆462Feb 24, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
IVGSZ / Flash-VStream
View on GitHub
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆285Oct 15, 2025Updated 9 months ago
JIA-Lab-research / Prompt-Highlighter
View on GitHub
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
☆159Jul 23, 2024Updated last year
FudanNLPLAB / MouSi
View on GitHub
☆75Mar 7, 2024Updated 2 years ago
FoundationVision / UniRef
View on GitHub
[ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces
☆238Feb 14, 2025Updated last year
yongliu20 / GSFM
View on GitHub
[ECCV2022] Global Spectral Filter Memory Network for Video Object Segmentation
☆42Jul 13, 2022Updated 4 years ago
franciszzj / OpenPSG
View on GitHub
[ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
☆50Jan 8, 2025Updated last year
UX-Decoder / DINOv
View on GitHub
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
☆542Apr 8, 2024Updated 2 years ago