lyan62 / FoodieQALinks

Official Repo for FoodieQA paper (EMNLP 2024)

☆16

Alternatives and similar repositories for FoodieQA

Users that are interested in FoodieQA are comparing it to the libraries listed below

Sorting:

luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆46Updated 11 months ago
keven980716 / weak-to-strong-deception
[ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"
☆13Updated last year
tianyi-lab / Mosaic-IT
[ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning
☆20Updated 3 weeks ago
MajorDavidZhang / MCL
code for Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning
☆18Updated last year
ExplainableML / sae-for-vlm
[NeurIPS 2025] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
☆37Updated 6 months ago
luka-group / mDPO
[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.
☆82Updated 11 months ago
kokolerk / TON
[NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
☆47Updated 2 weeks ago
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆88Updated last year
claws-lab / projection-in-MLLMs
Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'
☆16Updated last year
zefang-liu / AdaMoLE
AdaMoLE: Adaptive Mixture of LoRA Experts
☆37Updated last year
zhiyuanhubj / Long_form_VideoQA
[EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering
☆18Updated last year
ShiZhengyan / DePT
[ICLR 2024] This is the repository for the paper titled "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning"
☆96Updated last year
nickjiang2378 / vlm-hallucinations
Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations" (ICLR '25)
☆86Updated 4 months ago
Pbihao / SLM
☆28Updated last year
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆80Updated last year
OpenSparseLLMs / CLIP-MoE
CLIP-MoE: Mixture of Experts for CLIP
☆48Updated last year
jiaangli / VLCA
Do Vision and Language Models Share Concepts? A Vector Space Alignment Study
☆16Updated 10 months ago
AlignGPT-VL / AlignGPT
Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"
☆33Updated last year
zeyofu / ReFocus_Code
Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]
☆39Updated 2 months ago
which47 / LLMCL
Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning
☆35Updated 11 months ago
kigb / DropoutDecoding
[NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"
☆19Updated 10 months ago
shiqichen17 / VLM_Merging
Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)
☆76Updated 3 weeks ago
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆66Updated 4 months ago
waltonfuture / MM-UPT
[NeurIPS 2025] Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
☆56Updated 3 weeks ago
PLUM-Lab / MixLoRA
Multimodal Instruction Tuning with Conditional Mixture of LoRA (ACL 2024)
☆31Updated last year
d-ailin / CLIP-Guided-Decoding
☆17Updated last year
arnab-api / romba
Applies ROME and MEMIT on Mamba-S4 models
☆14Updated last year
YuxiXie / V-DPO
Preference Learning for LLaVA
☆51Updated 11 months ago
Zayne-sprague / To-CoT-or-not-to-CoT
☆25Updated 6 months ago
s-vco / s-vco
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
☆15Updated 4 months ago