CSU-JPG / Awesome-VLM-ReasoningLinks
☆16Updated 2 months ago
Alternatives and similar repositories for Awesome-VLM-Reasoning
Users that are interested in Awesome-VLM-Reasoning are comparing it to the libraries listed below
Sorting:
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆69Updated 2 months ago
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆92Updated 8 months ago
- Papers about Hallucination in Multi-Modal Large Language Models (MLLMs)☆94Updated 8 months ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆217Updated last year
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆149Updated 5 months ago
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆40Updated 8 months ago
- Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grain…☆81Updated last month
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆49Updated 3 weeks ago
- mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating☆96Updated last year
- This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and contin…☆75Updated last year
- ☆134Updated 6 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆91Updated last year
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆360Updated 7 months ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆125Updated last week
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆71Updated last year
- [NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models☆44Updated last year
- A RLHF Infrastructure for Vision-Language Models☆180Updated 8 months ago
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆68Updated 4 months ago
- R1-like Video-LLM for Temporal Grounding☆110Updated last month
- Evaluate robustness of adaptation methods on large vision-language models☆19Updated last year
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…☆295Updated 8 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 3 months ago
- [CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding☆302Updated 10 months ago
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆284Updated last year
- R1-Vision: Let's first take a look at the image☆48Updated 5 months ago
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆239Updated 2 months ago
- MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka☆313Updated last month
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆47Updated 2 months ago
- ☆152Updated 9 months ago
- [CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want☆14Updated 7 months ago