tsunghan-wu / reverse_vlmLinks
π₯ [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling (REVERSE)"
β50Updated 3 months ago
Alternatives and similar repositories for reverse_vlm
Users that are interested in reverse_vlm are comparing it to the libraries listed below
Sorting:
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.β31Updated 4 months ago
- π₯ [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"β26Updated 10 months ago
- Official implement of MIA-DPOβ69Updated 11 months ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasingββ59Updated 6 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"β32Updated last year
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Modelsβ45Updated 5 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)β218Updated 5 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Modelsβ33Updated last year
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Modelsβ107Updated last year
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"β29Updated 9 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Modelsβ83Updated 2 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Groundingβ65Updated 6 months ago
- Official Repository of Personalized Visual Instruct Tuningβ33Updated 9 months ago
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attentionβ66Updated 4 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'β202Updated 5 months ago
- Official codebase for the paper Latent Visual Reasoningβ69Updated 2 months ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)β96Updated last year
- The code repository of UniRLβ47Updated 7 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Modelsβ92Updated last year
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.orβ¦β153Updated 3 months ago
- The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?β41Updated last year
- β65Updated last month
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".β77Updated 5 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".β202Updated 6 months ago
- β35Updated last year
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Modelsβ82Updated 10 months ago
- β28Updated 10 months ago
- [ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoningβ42Updated last month
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedbackβ76Updated last year
- ICML2025β62Updated 4 months ago