tsunghan-wu / reverse_vlm
π₯ Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling"
β22Updated last week
Alternatives and similar repositories for reverse_vlm:
Users that are interested in reverse_vlm are comparing it to the libraries listed below
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignmentβ50Updated 3 months ago
- Official Repository of Personalized Visual Instruct Tuningβ28Updated last month
- β13Updated 7 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"β25Updated 7 months ago
- Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)β27Updated 8 months ago
- Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).β67Updated this week
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videosβ43Updated 2 months ago
- β22Updated 10 months ago
- β77Updated 3 weeks ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]β20Updated 8 months ago
- A Massive Multi-Discipline Lecture Understanding Benchmarkβ14Updated last week
- β40Updated 3 weeks ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compressionβ55Updated 2 months ago
- [ICLR 2025] Video Action Differencingβ36Updated last month
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"β28Updated 6 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspectiveβ66Updated 5 months ago
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulationβ28Updated 3 months ago
- π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"β37Updated 10 months ago
- β33Updated 2 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrβ¦β75Updated 4 months ago
- β19Updated 5 months ago
- [ICLR 2025] SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generationβ36Updated 3 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)β44Updated last year
- LEO: A powerful Hybrid Multimodal LLMβ17Updated 3 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Attenβ¦β42Updated 2 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.orβ¦β123Updated 9 months ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesisβ46Updated last week
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionalityβ16Updated 6 months ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)β56Updated last year
- Code for "How far can we go with ImageNet for Text-to-Image generation?" paperβ84Updated last month