Shengcao-Cao/groundLMM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Shengcao-Cao/groundLMM)

Shengcao-Cao / groundLMM

Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision

☆47

Alternatives and similar repositories for groundLMM

Users that are interested in groundLMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

wusize / F-LMM
View on GitHub
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆115May 29, 2025Updated last year
lizhou-cs / mglmm
View on GitHub
☆32Jun 14, 2026Updated last month
rkzheng99 / ViLLa
View on GitHub
Video Reasoning Segmentation
☆26Nov 29, 2024Updated last year
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆964Aug 5, 2025Updated 11 months ago
congvvc / LaSagnA
View on GitHub
Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".
☆63Apr 29, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
see-say-segment / sesame
View on GitHub
🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"
☆47Jun 16, 2024Updated 2 years ago
baoxiaoyi / CoReS
View on GitHub
code for the paper "CoReS: Orchestrating the Dance of Reasoning and Segmentation"
☆23Nov 24, 2025Updated 8 months ago
dahyun-kang / lavg
View on GitHub
[ECCV'24] Official PyTorch implementation of In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
☆51Sep 24, 2024Updated last year
LeapLabTHU / GSVA
View on GitHub
[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models
☆166Sep 12, 2024Updated last year
cilinyan / ReVOS-api
View on GitHub
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
☆22Jul 20, 2024Updated 2 years ago
sangminwoo / RITUAL
View on GitHub
Official pytorch implementation of "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language…
☆14Dec 16, 2024Updated last year
cilinyan / VISA
View on GitHub
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
☆214Aug 5, 2024Updated last year
congvvc / InstructSeg
View on GitHub
[ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"
☆56Feb 10, 2025Updated last year
iLearn-Lab / ACM-MM25-PUMA
View on GitHub
[ACM MM 2025] PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning
☆18Jun 6, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
zamling / PSALM
View on GitHub
[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"
☆269Dec 30, 2024Updated last year
Huntersxsx / RIS-Learning-List
View on GitHub
Related papers about Referring Image Segmentation (RIS)
☆16Dec 26, 2023Updated 2 years ago
d-ailin / CLIP-Guided-Decoding
View on GitHub
☆18Aug 1, 2024Updated last year
Han-Zongbo / Skip-n
View on GitHub
This repository contains the code of our paper 'Skip \n: A simple method to reduce hallucination in Large Vision-Language Models'.
☆15Feb 12, 2024Updated 2 years ago
mrwu-mac / R-Bench
View on GitHub
[ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'
☆24Jan 1, 2025Updated last year
MaverickRen / PixelLM
View on GitHub
[CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.
☆273Feb 11, 2025Updated last year
showlab / VideoLISA
View on GitHub
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
☆149Dec 26, 2024Updated last year
wangjunchi / LLMSeg
View on GitHub
LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
☆194Apr 16, 2024Updated 2 years ago
callsys / ControlCap
View on GitHub
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆81Oct 25, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
LinfengYuan1997 / LoSh
View on GitHub
[CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
☆13Jun 17, 2024Updated 2 years ago
sunsmarterjie / ChatterBox
View on GitHub
[AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues
☆61May 2, 2025Updated last year
jefferyZhan / Griffon
View on GitHub
Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1(CVPR 2026).
☆250Apr 17, 2026Updated 3 months ago
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
Vibashan / PosSAM
View on GitHub
Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anything
☆71Apr 7, 2024Updated 2 years ago
yellow-binary-tree / HawkEye
View on GitHub
Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos
☆47Apr 29, 2024Updated 2 years ago
MCG-NKU / ExperiCV
View on GitHub
Initial code for computer vision experiments
☆11Jan 1, 2023Updated 3 years ago
whwu95 / FreeVA
View on GitHub
FreeVA: Offline MLLM as Training-Free Video Assistant
☆69Jun 9, 2024Updated 2 years ago
takomc / amp
View on GitHub
【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"
☆22Sep 26, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
lxtGH / DenseWorld-1M
View on GitHub
Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"
☆129Oct 2, 2025Updated 9 months ago
IDEA-Research / ChatRex
View on GitHub
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
☆216Oct 15, 2025Updated 9 months ago
ParadoxZW / LLaVA-UHD-Better
View on GitHub
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
☆35Aug 12, 2024Updated last year
RobertLuo1 / NeurIPS2023_SOC
View on GitHub
[NeurIPS 2023] The official implementation of SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
☆33Mar 16, 2024Updated 2 years ago
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆507Aug 9, 2024Updated last year
ByungKwanLee / Meteor
View on GitHub
[NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…
☆116May 30, 2024Updated 2 years ago
meetdavidwan / crg
View on GitHub
PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"
☆39Mar 4, 2024Updated 2 years ago