JIA-Lab-research/Prompt-Highlighter

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JIA-Lab-research/Prompt-Highlighter)

JIA-Lab-research / Prompt-Highlighter

[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs

☆159

Alternatives and similar repositories for Prompt-Highlighter

Users that are interested in Prompt-Highlighter are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JIA-Lab-research / Ref-NPR
View on GitHub
[CVPR 2023] Ref-NPR: Reference-Based Non-PhotoRealistic Radiance Fields
☆126Jul 7, 2023Updated 3 years ago
mrwu-mac / ControlMLLM
View on GitHub
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆211Jul 17, 2025Updated last year
snap-research / MyVLM
View on GitHub
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)
☆188Jul 5, 2024Updated 2 years ago
JIA-Lab-research / GroupContrast
View on GitHub
[CVPR 2024] GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding
☆44Mar 15, 2024Updated 2 years ago
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
CVMI-Lab / IST-Net
View on GitHub
(ICCV2023) IST-Net: Prior-free Category-level Pose Estimation with Implicit Space Transformation
☆120Dec 7, 2023Updated 2 years ago
AILab-CVC / SEED-Bench
View on GitHub
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆366Jan 14, 2025Updated last year
YiyangZhou / LURE
View on GitHub
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
☆158Apr 30, 2024Updated 2 years ago
tsb0601 / MMVP
View on GitHub
☆364Jan 27, 2024Updated 2 years ago
jiaangli / VILA
View on GitHub
[TACL/EMNLP'24] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study
☆16Nov 22, 2024Updated last year
wusize / F-LMM
View on GitHub
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆115May 29, 2025Updated last year
JIA-Lab-research / RIVAL
View on GitHub
[NeurIPS 2023 Spotlight] Real-World Image Variation by Aligning Diffusion Inversion Chain
☆154Jan 2, 2024Updated 2 years ago
yuezih / less-is-more
View on GitHub
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
☆58Oct 28, 2024Updated last year
LALBJ / PAI
View on GitHub
[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
☆171Nov 6, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
deepcs233 / Visual-CoT
View on GitHub
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆447Dec 22, 2024Updated last year
JIA-Lab-research / LLaMA-VID
View on GitHub
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
☆861Jul 29, 2024Updated last year
AtsuMiyai / UPD
View on GitHub
[ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
☆82Mar 6, 2026Updated 4 months ago
shikiw / OPERA
View on GitHub
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allo…
☆411Aug 24, 2024Updated last year
JIA-Lab-research / Video-P2P
View on GitHub
Video-P2P: Video Editing with Cross-attention Control
☆431Jun 30, 2025Updated last year
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆727Sep 24, 2025Updated 10 months ago
findalexli / mllm-dpo
View on GitHub
[ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model
☆48Nov 10, 2024Updated last year
TencentARC / MindOmni
View on GitHub
[NeurIPS2025] The official implementation of MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
☆139Oct 15, 2025Updated 9 months ago
BRZ911 / Wrong-of-Thought
View on GitHub
[EMNLP 2024 Findings] Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information
☆13Oct 1, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
baaivision / DenseFusion
View on GitHub
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year
RunpeiDong / DreamLLM
View on GitHub
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆462Dec 2, 2024Updated last year
agneet42 / revision
View on GitHub
[ECCV 2024] "REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models"
☆14Aug 6, 2024Updated last year
OpenGVLab / all-seeing
View on GitHub
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …
☆508Aug 9, 2024Updated last year
GongyeLiu / StyleCrafter-SDXL
View on GitHub
Code of StyleCrafter on SDXL
☆20Jun 25, 2024Updated 2 years ago
mu-cai / matryoshka-mm
View on GitHub
Matryoshka Multimodal Models
☆123Jan 22, 2025Updated last year
JIA-Lab-research / Q-LLM
View on GitHub
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
☆54Jul 16, 2024Updated 2 years ago
chancharikmitra / CCoT
View on GitHub
[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"
☆142Jun 20, 2024Updated 2 years ago
IIGROUP / SCL
View on GitHub
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
☆20Dec 21, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
AFeng-x / Draw-and-Understand
View on GitHub
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆94Dec 1, 2025Updated 7 months ago
yuhui-zh15 / VLMClassifier
View on GitHub
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆98Oct 19, 2024Updated last year
sjtuytc / AAAI21-RoutineAugmentedPolicyLearning
View on GitHub
Source code to the AAAI21 publication Augmenting Policy Learning with Routines Discovered from a Single Demonstration
☆17Jan 7, 2021Updated 5 years ago
d-ailin / CLIP-Guided-Decoding
View on GitHub
☆18Aug 1, 2024Updated last year
FreedomIntelligence / TRIM
View on GitHub
We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…
☆22Jan 11, 2026Updated 6 months ago
JIA-Lab-research / MagicMirror
View on GitHub
[ICCV 2025] MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers
☆130Jun 26, 2025Updated last year
EnVision-Research / DDSM
View on GitHub
Denoising Diffusion Step-aware Models (ICLR2024)
☆62Feb 6, 2024Updated 2 years ago