dvlab-research / Prompt-HighlighterLinks

[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs

☆157

Alternatives and similar repositories for Prompt-Highlighter

Users that are interested in Prompt-Highlighter are comparing it to the libraries listed below

Sorting:

MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆201Updated last year
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆147Updated last year
PKU-YuanGroup / Video-Bench
A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!
☆136Updated 2 years ago
foundation-multimodal-models / CAPTURE
☆81Updated last year
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆92Updated last year
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆95Updated 2 years ago
isekai-portal / Link-Context-Learning
☆100Updated last year
42Shawn / LLaVA-PruMerge
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
☆158Updated 3 months ago
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆153Updated 3 months ago
bronyayang / Law_of_Vision_Representation_in_MLLMs
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
☆172Updated 2 months ago
imagegridworth / IG-VLM
☆140Updated last year
X2FD / LVIS-INSTRUCT4V
☆133Updated 2 years ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆69Updated 11 months ago
Beckschen / LLaVolta
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
☆62Updated 10 months ago
TIGER-AI-Lab / Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]
☆237Updated 9 months ago
baaivision / DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆158Updated last year
Yxxxb / VoCo-LLaMA
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆202Updated 6 months ago
rese1f / aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
☆137Updated 6 months ago
EvolvingLMMs-Lab / VideoMMMU
☆62Updated 3 months ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆168Updated last year
rongyaofang / PUMA
Empowering Unified MLLM with Multi-granular Visual Generation
☆130Updated 11 months ago
UCSC-VLAA / Recap-DataComp-1B
[ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"
☆147Updated last year
Yanqing0327 / MLLMs-Augmented
The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》
☆31Updated last year
alibaba / conv-llava
☆124Updated last year
patrick-tssn / VideoHallucer
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
☆41Updated 2 weeks ago
longvideobench / LongVideoBench
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆112Updated last year
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆59Updated last year
AFeng-x / Draw-and-Understand
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆92Updated last month
SHI-Labs / CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆161Updated last year
luogen1996 / LLaVA-HR
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆246Updated last year