tsunghan-wu / reverse_vlmView external linksLinks
π₯ [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling (REVERSE)"
β52Jan 22, 2026Updated 3 weeks ago
Alternatives and similar repositories for reverse_vlm
Users that are interested in reverse_vlm are comparing it to the libraries listed below
Sorting:
- π₯ [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"β26Feb 9, 2025Updated last year
- Official Repository of VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agentsβ93Feb 2, 2026Updated last week
- [arXiv 2025] "CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought"β14Apr 3, 2025Updated 10 months ago
- The official implementation of Preference Data Reward-Augmentation.β18May 1, 2025Updated 9 months ago
- β18Jun 10, 2025Updated 8 months ago
- [ICLR 2026] Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editingβ29Feb 6, 2026Updated last week
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasingββ58Jan 26, 2026Updated 2 weeks ago
- β34Oct 9, 2025Updated 4 months ago
- β15Sep 11, 2025Updated 5 months ago
- π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"β47Jun 16, 2024Updated last year
- Code release for AccDiffusionV2 (TPAMI)β35Nov 4, 2025Updated 3 months ago
- Code repository for the paper "The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Leβ¦β13Jan 16, 2025Updated last year
- Toward Ambulatory Vision: Learning Visually-Grounded Active View Selectionβ19Feb 5, 2026Updated last week
- β13Apr 23, 2025Updated 9 months ago
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"β57Jan 23, 2026Updated 3 weeks ago
- π₯ [ICLR 2025] Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"β38Nov 21, 2025Updated 2 months ago
- [ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusionβ13Mar 17, 2025Updated 10 months ago
- [NeurIPS 2024] "Self-Calibrated Tuning of Vision-Language Models for Out-of-Distribution Detection"β13Oct 28, 2024Updated last year
- Orienting Latent Actions for Video World Modelingβ48Updated this week
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Modelsβ49Jul 7, 2025Updated 7 months ago
- Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Modelsβ41Sep 30, 2024Updated last year
- This repository contains the code for the paper βNeuro-Symbolic Query Compilerβ, accepted to the Findings of ACL 2025.β16Oct 20, 2025Updated 3 months ago
- [NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"β22Dec 8, 2024Updated last year
- [ACL 2025] Can MLLMs Understand the Deep Implication Behind Chinese Images?β20Oct 20, 2025Updated 3 months ago
- Official implementation of StochSync: a zero-shot approach for image generation in arbitrary spaces via stochastic diffusion synchronizatβ¦β19Jun 24, 2025Updated 7 months ago
- [NeurIPS 2025] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Modelsβ61Nov 27, 2025Updated 2 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Modelsβ39Jun 14, 2025Updated 8 months ago
- β17Apr 9, 2025Updated 10 months ago
- Recursive Visual Programming (ECCV 2024)β18Nov 20, 2024Updated last year
- [ACL 2025 Findings] Implicit Reasoning in Transformers is Reasoning through Shortcutsβ17Mar 11, 2025Updated 11 months ago
- [ACL 2025 Findings] Text2World: Benchmarking Large Language Models for Symbolic World Model Generationβ27Feb 25, 2025Updated 11 months ago
- VHTestβ15Oct 31, 2024Updated last year
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"β45Jul 1, 2025Updated 7 months ago
- β42Jul 9, 2025Updated 7 months ago
- Cost-Sensitive Toolpath Agent for Multi-turn Image Editingβ25Mar 26, 2025Updated 10 months ago
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidationβ19Feb 14, 2025Updated last year
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of uβ¦β25Jun 4, 2025Updated 8 months ago
- Official Implementation of CODEβ17Sep 26, 2024Updated last year
- A tool to assist in the interpretation of learned features in sparse autoencoders (in particular the four SAE's trained by Joseph Bloom oβ¦β19Oct 4, 2024Updated last year