om-ai-lab / GroundVLPLinks

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)

☆72

Alternatives and similar repositories for GroundVLP

Users that are interested in GroundVLP are comparing it to the libraries listed below

Sorting:

linhuixiao / CLIP-VG
[TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.
☆131Updated last week
Liuziyu77 / RAR
The official implementation of RAR
☆92Updated last year
linyq2117 / TagCLIP
[AAAI 2024] TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training
☆104Updated last year
SY-Xuan / Pink
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
☆95Updated 10 months ago
lorebianchi98 / FG-OVD
[CVPR 2024 Highlight] Official repository of the paper "The devil is in the fine-grained details: Evaluating open-vocabulary object detec…
☆61Updated 7 months ago
om-ai-lab / OVDEval
A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)
☆57Updated last year
callsys / DynRefer
[CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
☆55Updated 8 months ago
JiuTian-VL / JiuTian-LION
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
☆152Updated 2 months ago
KyanChen / OvarNet
OvarNet official implement of the paper "OvarNet: Towards Open-vocabulary Object Attribute Recognition"
☆105Updated 2 years ago
WeitaiKang / SegVG
[ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
☆64Updated last year
Dmmm1997 / SimVG
[NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
☆99Updated 3 weeks ago
prannaykaul / mm-ovod
Official repo for our ICML 23 paper: "Multi-Modal Classifiers for Open-Vocabulary Object Detection"
☆95Updated 2 years ago
tgxs002 / CORA
A DETR-style framework for open-vocabulary detection (OVD). CVPR 2023
☆197Updated 2 years ago
FoundationVision / GenerateU
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
☆185Updated 7 months ago
eternaldolphin / LaMI-DETR
[ECCV 2024] Official implementation of "LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction"
☆86Updated 7 months ago
PKU-ICST-MIPL / DyFo_CVPR2025
☆95Updated 3 months ago
shikras / d-cube
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating…
☆138Updated last year
Meituan-AutoML / Lenna
☆86Updated last year
PKU-ICST-MIPL / Finedefics_ICLR2025
☆75Updated 7 months ago
lezhang7 / SAIL
[CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"
☆51Updated 3 months ago
wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆105Updated 5 months ago
LutingWang / OADP
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
☆62Updated last month
linhuixiao / HiVG
[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.
☆55Updated last week
joeyz0z / MeaCap
(CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning
☆53Updated last year
zjukg / Structure-CLIP
[Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations
☆153Updated last year
AFeng-x / Draw-and-Understand
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆91Updated 5 months ago
CVMI-Lab / CoDet
(NeurIPS2023) CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
☆121Updated last year
ruc-aimc-lab / TeachCLIP
[CVPR 2024] TeachCLIP for Text-to-Video Retrieval
☆40Updated 6 months ago
ylingfeng / FGVP
Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023
☆55Updated last year
WillDreamer / Aurora
[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model
☆88Updated last year