WayneTomas / VPP-LLaVALinks

[TMM 2025] This is the official Pytorch code for our paper "Visual Position Prompt for MLLM based Visual Grounding".

☆20

Alternatives and similar repositories for VPP-LLaVA

Users that are interested in VPP-LLaVA are comparing it to the libraries listed below

Sorting:

ligeng0197 / Awesome-Thinking-With-Images
Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grain…
☆67Updated last week
HKUST-LongGroup / Awesome-MLLM-Benchmarks
☆129Updated 5 months ago
palchenli / VL-Instruction-Tuning
☆91Updated last year
yuyq96 / R1-Vision
R1-Vision: Let's first take a look at the image
☆48Updated 5 months ago
RupertLuo / VoCoT
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
☆69Updated last year
OpenGVLab / LCL
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
☆69Updated 5 months ago
yaolinli / DeCo
Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models
☆40Updated this week
opendatalab / HA-DPO
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
☆89Updated last year
RUCAIBox / POPE
The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
☆212Updated last year
xuanlinli17 / large_vlm_distillation_ood
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)
☆58Updated last year
FeipengMa6 / VLoRA
[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆45Updated 3 months ago
MME-Benchmarks / MME-RealWorld
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆128Updated 4 months ago
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
JiuTian-VL / JiuTian-LION
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
☆149Updated last year
chancharikmitra / CCoT
[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"
☆132Updated last year
yfzhang114 / LLaVA-Align
This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…
☆78Updated 4 months ago
BAAI-DCAI / Visual-Instruction-Tuning
SVIT: Scaling up Visual Instruction Tuning
☆163Updated last year
Liuziyu77 / RAR
The official implementation of RAR
☆88Updated last year
appletea233 / Temporal-R1
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆44Updated last month
RifleZhang / LLaVA-Hound-DPO
☆152Updated 8 months ago
Hui-design / R1-Video-fixbug
[Blog 1] Recording a bug of grpo_trainer in some R1 projects
☆20Updated 4 months ago
ggjy / DeLVM
☆118Updated last year
www-Ye / Time-R1
R1-like Video-LLM for Temporal Grounding
☆109Updated 3 weeks ago
Code-kunkun / LamRA
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
☆128Updated last week
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆47Updated 4 months ago
bronyayang / HallE_Control
HallE-Control: Controlling Object Hallucination in LMMs
☆31Updated last year
foundation-multimodal-models / ConBench
[NeurIPS'24] Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".
☆36Updated 8 months ago
Wang-Xiaodong1899 / CVPR25-MLLM-Paper-List
🔥CVPR 2025 Multimodal Large Language Models Paper List
☆147Updated 4 months ago
tsb0601 / MMVP
☆341Updated last year
SivanDoveh / TSVLC
Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models
☆46Updated last year