lyan62 / FoodieQALinks
Official Repo for FoodieQA paper (EMNLP 2024)
☆16Updated this week
Alternatives and similar repositories for FoodieQA
Users that are interested in FoodieQA are comparing it to the libraries listed below
Sorting:
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆42Updated 8 months ago
- [ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality☆32Updated last month
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆15Updated 11 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆87Updated 6 months ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆51Updated 8 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆75Updated 7 months ago
- Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering☆60Updated 7 months ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆86Updated last year
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆11Updated 8 months ago
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆36Updated last week
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding☆35Updated last month
- [ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning☆19Updated this week
- (ICLR2025 Spotlight) DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.☆34Updated 3 months ago
- Official Repository of LatentSeek☆49Updated 3 weeks ago
- codes for Efficient Test-Time Scaling via Self-Calibration☆14Updated 3 months ago
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations" (ICLR '25)☆75Updated last month
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆30Updated last month
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆22Updated 4 months ago
- EMPO, A Fully Unsupervised RLVR Method☆40Updated 2 weeks ago
- [EMNLP'2023 Findings] MoqaGPT, for zero-shot multimodal question answering with LLMs☆12Updated 6 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 6 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆76Updated last year
- ☆24Updated 2 months ago
- Official Code for ACL 2023 Outstanding Paper: World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Languag…☆32Updated last year
- Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?☆31Updated this week
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆13Updated last year
- ☆13Updated last month
- Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?☆24Updated 3 months ago
- ☆57Updated 7 months ago
- Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"☆32Updated 11 months ago