callsys/ControlCap

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/callsys/ControlCap)

callsys / ControlCap

[ECCV 2024] ControlCap: Controllable Region-level Captioning

☆81

Alternatives and similar repositories for ControlCap

Users that are interested in ControlCap are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

callsys / DynRefer
View on GitHub
[CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
☆59Mar 4, 2025Updated last year
callsys / GenPromp
View on GitHub
[ICCV 2023] Generative Prompt Model for Weakly Supervised Object Localization
☆57Nov 10, 2023Updated 2 years ago
franciszzj / OpenPSG
View on GitHub
[ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
☆50Jan 8, 2025Updated last year
callsys / GMPO
View on GitHub
[ICLR 2026] Geometric-Mean Policy Optimization
☆104Jan 26, 2026Updated 5 months ago
callsys / FlowText
View on GitHub
[ICME 2023] FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation
☆13May 13, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
sunsmarterjie / ChatterBox
View on GitHub
[AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues
☆61May 2, 2025Updated last year
qiujihao19 / Artemis
View on GitHub
[NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videos
☆27Apr 8, 2025Updated last year
linhuixiao / HiVG
View on GitHub
[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.
☆65Nov 10, 2025Updated 8 months ago
NMS05 / Patch-Aligned-Contrastive-Learning
View on GitHub
☆24Jul 8, 2023Updated 3 years ago
letitiabanana / PnP-OVSS
View on GitHub
[CVPR'24] Code for Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
☆18Jul 22, 2024Updated last year
wusize / F-LMM
View on GitHub
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆115May 29, 2025Updated last year
wuw2019 / LoTLIP
View on GitHub
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
☆49Jan 14, 2025Updated last year
mrwu-mac / ControlMLLM
View on GitHub
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆210Jul 17, 2025Updated last year
Vibashan / PosSAM
View on GitHub
Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anything
☆71Apr 7, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
XMUDeepLIT / AVG-LLaVA
View on GitHub
Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"
☆33Oct 12, 2024Updated last year
dahyun-kang / lavg
View on GitHub
[ECCV'24] Official PyTorch implementation of In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
☆51Sep 24, 2024Updated last year
LinfengYuan1997 / LoSh
View on GitHub
[CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
☆13Jun 17, 2024Updated 2 years ago
slonetime / EBSeg
View on GitHub
[CVPR2024] Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
☆41Jan 12, 2026Updated 6 months ago
mc-lan / ClearCLIP
View on GitHub
[ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
☆99Mar 26, 2025Updated last year
V3Det / V3Det
View on GitHub
☆121Jun 11, 2024Updated 2 years ago
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆507Aug 9, 2024Updated last year
HKUST-LongGroup / RECODE
View on GitHub
[NeurIPS 2023] Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models
☆23Oct 21, 2025Updated 9 months ago
mingrui-wu / OSI-Bench
View on GitHub
Official repo of From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs
☆24Jun 23, 2026Updated 3 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
wusize / CLIPSelf
View on GitHub
[ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
☆207Feb 5, 2024Updated 2 years ago
Shengcao-Cao / groundLMM
View on GitHub
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆47Oct 19, 2025Updated 9 months ago
yinyueqin / DenseRewardRLHF-PPO
View on GitHub
This repository contains the code and released models for the paper Segmenting Text and Learning Their Rewards for Improved RLHF in Langu…
☆19Jan 8, 2025Updated last year
huuuuusy / videocube-toolkit
View on GitHub
The official python toolkit for running experiments and evaluate performance on VideoCube benchmark @TPAMI2023
☆31Apr 1, 2024Updated 2 years ago
aimagelab / freeda
View on GitHub
FreeDA: Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation (CVPR 2024)
☆50Aug 28, 2024Updated last year
SivanDoveh / DAC
View on GitHub
Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models
☆28Nov 29, 2023Updated 2 years ago
tydpan / NorCal
View on GitHub
☆30Oct 25, 2021Updated 4 years ago
yuecao0119 / MMInstruct
View on GitHub
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆64Nov 7, 2024Updated last year
mlvlab / RALF
View on GitHub
Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".
☆47Sep 12, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
JarintotionDin / ZiRaGroundingDINO
View on GitHub
Official pytorch implementation of ZiRa, a method for incremental vision language object detection (IVLOD)，which has been accepted by Neu…
☆31Oct 22, 2024Updated last year
MzeroMiko / XDLM
View on GitHub
[ICML 2026 Spotlight] Code for miXed Discrete Diffusion Language Model
☆27Mar 16, 2026Updated 4 months ago
YS-IMTech / HyperDreamer
View on GitHub
(Siggraph Asia 2023) Project Page of "HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image"
☆10Dec 9, 2023Updated 2 years ago
JialianW / GRiT
View on GitHub
GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)
☆341Jan 8, 2024Updated 2 years ago
OVAD-Benchmark / ovad-benchmark-code
View on GitHub
OVAD: Open-vocabulary Attribute Detection code
☆30Aug 28, 2023Updated 2 years ago
callsys / TextVR
View on GitHub
[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆31Dec 28, 2023Updated 2 years ago
lizhou-cs / mglmm
View on GitHub
☆32Jun 14, 2026Updated last month