tsunghan-wu / reverse_vlmLinks

🔥 Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling"

☆40

Alternatives and similar repositories for reverse_vlm

Users that are interested in reverse_vlm are comparing it to the libraries listed below

Sorting:

Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆63Updated 6 months ago
Cooperx521 / ScaleCap
Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’
☆52Updated last month
Dongping-Chen / ISG
(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.
☆27Updated this week
mll-lab-nu / TStar
TStar is a unified temporal search framework for long-form video question answering
☆59Updated 4 months ago
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆85Updated 10 months ago
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆32Updated 5 months ago
GuangyanS / Sys2-LLaVA
☆26Updated 5 months ago
UMass-Embodied-AGI / Mirage
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
☆115Updated last week
showlab / UniRL
The code repository of UniRL
☆36Updated 2 months ago
Becomebright / ReKV
Official PyTorch Code of ReKV (ICLR'25)
☆36Updated 4 months ago
Yxxxb / VoCo-LLaMA
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆180Updated last month
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆77Updated last year
egolife-ai / Ego-R1
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆97Updated last month
haoyu-bu / CAFe
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆21Updated 4 months ago
yu-rp / apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
☆98Updated 9 months ago
sled-group / moh
[NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models
☆29Updated 8 months ago
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆72Updated 10 months ago
yellow-binary-tree / MMDuet
Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…
☆33Updated 6 months ago
xing0047 / cca-llava
[NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention
☆58Updated 7 months ago
TencentARC / SEED-Bench-R1
☆87Updated last month
XMUDeepLIT / AVG-LLaVA
Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"
☆30Updated 9 months ago
Beckschen / LLaVolta
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
☆60Updated 5 months ago
Aurora-slz / MM-Verify
☆13Updated 5 months ago
yu-rp / VisualPerceptionToken
☆93Updated 4 months ago
Tencent / HaploVLM
ICML2025
☆51Updated 2 months ago
orrzohar / Video-STaR
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆66Updated last year
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆63Updated 3 weeks ago
ruili33 / TPO
☆35Updated 6 months ago
TencentARC / GRPO-CARE
☆67Updated last month
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆65Updated last month