SY-Xuan/Pink

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SY-Xuan/Pink)

SY-Xuan / Pink

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

☆100

Alternatives and similar repositories for Pink

Users that are interested in Pink are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JiuTian-VL / JiuTian-LION
View on GitHub
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
☆154Sep 3, 2025Updated 10 months ago
ChengHan111 / VPT-or-FT
View on GitHub
Official Pytorch implementation of 'Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning'? (ICLR2024)
☆13Mar 8, 2024Updated 2 years ago
mightyzau / RegionBLIP
View on GitHub
☆59Aug 7, 2023Updated 2 years ago
shikras / shikra
View on GitHub
☆814Jul 8, 2024Updated 2 years ago
findalexli / mllm-dpo
View on GitHub
[ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model
☆48Nov 10, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year
Yuqifan1117 / HalluciDoctor
View on GitHub
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆52Jul 16, 2024Updated 2 years ago
Chat-3D / Chat-3D
View on GitHub
Code for "Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes"
☆57Mar 28, 2024Updated 2 years ago
PVIT-official / PVIT
View on GitHub
Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
☆37Sep 19, 2023Updated 2 years ago
FreedomIntelligence / TRIM
View on GitHub
We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…
☆22Jan 11, 2026Updated 6 months ago
BAAI-DCAI / Visual-Instruction-Tuning
View on GitHub
SVIT: Scaling up Visual Instruction Tuning
☆167Jun 20, 2024Updated 2 years ago
MaverickRen / PixelLM
View on GitHub
[CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.
☆273Feb 11, 2025Updated last year
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
princetonvisualai / pointingqa
View on GitHub
Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"
☆19Oct 4, 2022Updated 3 years ago
Hon-Wong / ByteVideoLLM
View on GitHub
[ICCV 2025] Dynamic-VLM
☆28Dec 16, 2024Updated last year
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆963Aug 5, 2025Updated 11 months ago
FuxiaoLiu / LRV-Instruction
View on GitHub
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆297Mar 13, 2024Updated 2 years ago
YuchenLiu98 / COMM
View on GitHub
Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
☆211Jan 8, 2025Updated last year
CASIA-IVA-Lab / MRES
View on GitHub
This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…
☆74Jun 3, 2024Updated 2 years ago
cjhing / OS-FPI
View on GitHub
Official repository of OS-FPI
☆17Dec 22, 2024Updated last year
WisconsinAIVision / ViP-LLaVA
View on GitHub
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆338Jul 17, 2024Updated 2 years ago
mlpc-ucsd / BLIVA
View on GitHub
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
☆261Apr 14, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
cv516Buaa / OV-VG
View on GitHub
☆31Mar 25, 2024Updated 2 years ago
wusize / F-LMM
View on GitHub
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆115May 29, 2025Updated last year
JIA-Lab-research / LISA
View on GitHub
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
☆2,666Feb 16, 2025Updated last year
RLHF-V / RLHF-V
View on GitHub
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆310Sep 11, 2024Updated last year
SCNU203 / GeoQA-Plus
View on GitHub
☆20May 14, 2024Updated 2 years ago
luogen1996 / LLaVA-HR
View on GitHub
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆249Aug 14, 2024Updated last year
YiyangZhou / LURE
View on GitHub
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
☆158Apr 30, 2024Updated 2 years ago
HaozheZhao / MIC
View on GitHub
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
☆361Dec 18, 2023Updated 2 years ago
Gary-code / KECVQG
View on GitHub
[ACM MM 2023] The released code of paper "Deconfounded Visual Question Generation with Causal Inference"
☆10Sep 3, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
yuecao0119 / MMFuser
View on GitHub
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆63Nov 5, 2024Updated last year
OpenGVLab / VisionLLM
View on GitHub
VisionLLM Series
☆1,152Feb 27, 2025Updated last year
mightyzau / InfMLLM
View on GitHub
☆19Dec 6, 2023Updated 2 years ago
ylingfeng / FGVP
View on GitHub
Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023
☆57Feb 1, 2024Updated 2 years ago
XYPB / CLEFT
View on GitHub
Official Implementation of "CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning" on MIC…
☆18Feb 12, 2025Updated last year
baaivision / DenseFusion
View on GitHub
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year
GasolSun36 / MVP
View on GitHub
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
☆24Sep 9, 2024Updated last year