linhuixiao/Awesome-Visual-Grounding

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/linhuixiao/Awesome-Visual-Grounding)

linhuixiao / Awesome-Visual-Grounding

[TPAMI 2025] Towards Visual Grounding: A Survey

☆322

Alternatives and similar repositories for Awesome-Visual-Grounding

Users that are interested in Awesome-Visual-Grounding are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

linhuixiao / OneRef
View on GitHub
[NeurIPS 2024] OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling.
☆32Nov 13, 2025Updated 8 months ago
linhuixiao / HiVG
View on GitHub
[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.
☆65Nov 10, 2025Updated 8 months ago
MightXiong / FedMIT
View on GitHub
☆13Mar 14, 2025Updated last year
Dmmm1997 / SimVG
View on GitHub
[NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
☆103Oct 29, 2025Updated 8 months ago
linhuixiao / CLIP-VG
View on GitHub
[TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.
☆135Nov 10, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
sunzc-sunny / refdrone
View on GitHub
RefDrone: A Challenging Benchmark for Drone Scene Referring Expression Comprehension
☆44Jul 8, 2026Updated 2 weeks ago
LANMNG / LQVG
View on GitHub
☆32Nov 27, 2025Updated 7 months ago
Dmmm1997 / C3VG
View on GitHub
[AAAI2025 selected as oral] - Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
☆45Jul 2, 2025Updated last year
Ideal-ljl / AerialVG
View on GitHub
☆22Dec 2, 2025Updated 7 months ago
like413 / OPT-RSVG
View on GitHub
[TGRS 2024] Language-Guided Progressive Attention for Visual Grounding in Remote Sensing Images.
☆56Jun 10, 2025Updated last year
MCG-NJU / Dynamic-MDETR
View on GitHub
[TPAMI 2024] Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
☆29Sep 11, 2024Updated last year
Dmmm1997 / InstanceVG
View on GitHub
[TPAMI2025] Improving Generalized Visual Grounding with Instance-aware Joint Learning
☆33Apr 28, 2026Updated 2 months ago
LukeForeverYoung / QRNet
View on GitHub
☆41Jun 3, 2022Updated 4 years ago
liudaizong / Awesome-3D-Visual-Grounding
View on GitHub
😎 up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.
☆282Jan 14, 2026Updated 6 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
TheShadow29 / awesome-grounding
View on GitHub
awesome grounding: A curated list of research papers in visual grounding
☆1,127Sep 21, 2025Updated 10 months ago
Charles-Xie / awesome-described-object-detection
View on GitHub
A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring E…
☆357Nov 6, 2025Updated 8 months ago
djiajunustc / TransVG
View on GitHub
☆198Feb 27, 2024Updated 2 years ago
iSEE-Laboratory / ReferDINO
View on GitHub
(ICCV 2025) ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
☆142Nov 14, 2025Updated 8 months ago
function2-llx / MMMM
View on GitHub
[NAACL 2025] VividMed: Vision Language Model with Versatile Visual Grounding for Medicine
☆31Mar 10, 2025Updated last year
Shengcao-Cao / groundLMM
View on GitHub
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆47Oct 19, 2025Updated 9 months ago
yangli18 / VLTVG
View on GitHub
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022
☆97Dec 2, 2022Updated 3 years ago
PKU-ICST-MIPL / DyFo_CVPR2025
View on GitHub
☆116Aug 14, 2025Updated 11 months ago
LeapLabTHU / GSVA
View on GitHub
[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models
☆166Sep 12, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ZhanYang-nwpu / Awesome-Multimodal-Large-Language-Models-for-UAV-Vision-Language-Perception
View on GitHub
UAV-MLLMs
☆29Apr 7, 2026Updated 3 months ago
nnnth / UFO
View on GitHub
[NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Langu…
☆280Nov 5, 2025Updated 8 months ago
jianzongwu / Awesome-Open-Vocabulary
View on GitHub
(TPAMI 2024) A Survey on Open Vocabulary Learning
☆998May 12, 2026Updated 2 months ago
wusize / F-LMM
View on GitHub
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆115May 29, 2025Updated last year
Mr-Bigworth / MMCA
View on GitHub
Visual Grounding with Multi-modal Conditional Adaptation (ACMMM 2024 Oral)
☆26Jun 11, 2025Updated last year
jefferyZhan / Griffon
View on GitHub
Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1(CVPR 2026).
☆250Apr 17, 2026Updated 3 months ago
UCSB-AI / GRIT
View on GitHub
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
☆191Jan 16, 2026Updated 6 months ago
JierunChen / Ref-L4
View on GitHub
Evaluation code for Ref-L4, a new REC benchmark in the LMM era
☆61Dec 28, 2024Updated last year
gaoyingjay / TTAOD-F
View on GitHub
This is the implementation of the paper "Test-Time Adaptive Object Detection with Foundation Model" (Neurips 2025)
☆22Jan 30, 2026Updated 5 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
seilk / LocalizationHeads
View on GitHub
[CVPR 2025 Highlight] Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
☆79Aug 31, 2025Updated 10 months ago
Tavarich / Awesome-Referring-Video-Object-Segmentation
View on GitHub
A list of referring video object segmentation papers
☆63Jun 28, 2026Updated 3 weeks ago
forXuyx / Cinego
View on GitHub
🚀 轻量视频🎥 大模型🤖
☆23Apr 27, 2025Updated last year
JIA-Lab-research / Seg-Zero
View on GitHub
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
☆635Jan 17, 2026Updated 6 months ago
YifanXu74 / Libra
View on GitHub
Simple PyTorch implementation of "Libra: Building Decoupled Vision System on Large Language Models" (accepted by ICML 2024)
☆145Nov 29, 2024Updated last year
IDEA-Research / ChatRex
View on GitHub
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
☆216Oct 15, 2025Updated 9 months ago
CAESAR-Radi / TACMT
View on GitHub
Multimodal fusion
☆22Dec 25, 2024Updated last year