FightingFighting/cross-modal-information-flow-in-MLLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FightingFighting/cross-modal-information-flow-in-MLLM)

FightingFighting / cross-modal-information-flow-in-MLLM

This is the official repository for paper: cross-modal information flow in multimodal large language models

☆44

Alternatives and similar repositories for cross-modal-information-flow-in-MLLM

Users that are interested in cross-modal-information-flow-in-MLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pumpkin805 / FALIP
View on GitHub
[ECCV2024]FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
☆18Sep 11, 2024Updated last year
bscho333 / ReVisiT
View on GitHub
[ACL 2026 Main] Revisit What You See: Revealing Visual Semantics in Vision Tokens to Guide LVLM Decoding
☆26Nov 21, 2025Updated 8 months ago
ZhangqiJiang07 / middle_layers_indicating_hallucinations
View on GitHub
[CVPR 2025] Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Att…
☆84Oct 9, 2025Updated 9 months ago
ZhengyaoFang / PruneSID
View on GitHub
Official code for **Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity** (PruneSI…
☆13Mar 25, 2026Updated 4 months ago
naver-ai / muco
View on GitHub
Official Pytorch implementation of MuCo: Multi-turn Contrastive Learning for Multimodal Embedding Model (CVPR 2026)
☆15Apr 16, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
WolodjaZ / MSAE
View on GitHub
Interpreting CLIP with Hierarchical Sparse Autoencoders (ICML 2025)
☆28Jan 17, 2026Updated 6 months ago
obananas / HoloV
View on GitHub
[NeurIPS 2025 🔥] Official implementation for "Don't Just Chase “Highlighted Tokens” in MLLMs: Revisiting Visual Holistic Context Retenti…
☆66Mar 5, 2026Updated 4 months ago
OmriKaduri / vlm-interp
View on GitHub
Code for paper: "What’s in the Image? A Deep-Dive into the Vision of Vision Language Models" (CVPR 2025)
☆18May 1, 2025Updated last year
zifuwan / ONLY
View on GitHub
[ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
☆51Jul 7, 2025Updated last year
ustc-hyin / ClearSight
View on GitHub
Code for paper: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models
☆61Dec 18, 2024Updated last year
UCSB-AI / DMLR
View on GitHub
[CVPR2026] Official codebase for the paper "Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space"
☆85May 12, 2026Updated 2 months ago
tmlr-group / SCT
View on GitHub
[NeurIPS 2024] "Self-Calibrated Tuning of Vision-Language Models for Out-of-Distribution Detection"
☆13Oct 28, 2024Updated last year
amitakamath / whatsup_vlms
View on GitHub
Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".
☆71Feb 28, 2024Updated 2 years ago
jiangpin-legend / MR-HBA
View on GitHub
☆28Feb 14, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
zhangce01 / DeGF
View on GitHub
[ICLR 2025] Code for Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
☆26Apr 14, 2025Updated last year
The-Martyr / Awesome-Modality-Priors-in-MLLMs
View on GitHub
Latest Advances on Modality Priors in Multimodal Large Language Models
☆30Dec 10, 2025Updated 7 months ago
Vinsonzyh / BlueSuffix
View on GitHub
[ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
☆31Nov 2, 2025Updated 8 months ago
YangYY-Liu / HybridCBM
View on GitHub
☆16Jun 14, 2025Updated last year
Sreyan88 / VDGD
View on GitHub
Code for ICLR 2025 Paper: Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
☆25May 7, 2025Updated last year
leorebensabath / TMRPlusPlus
View on GitHub
☆25Mar 18, 2025Updated last year
kaiyuhwang / MLLM-Survey
View on GitHub
The paper list of multilingual pre-trained models (Continual Updated).
☆25Jun 18, 2024Updated 2 years ago
shengliu66 / VTI
View on GitHub
Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering
☆117Nov 23, 2024Updated last year
Lackel / AGLA
View on GitHub
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆68Jul 16, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
VincentLeebang / lvr
View on GitHub
Official codebase for the paper Latent Visual Reasoning
☆171Oct 22, 2025Updated 9 months ago
XiaoyuXu-Vincent / step-saliency
View on GitHub
Official code for paper "Reasoning Fails Where Step Flow Breaks" (ACL 2026)
☆18Apr 19, 2026Updated 3 months ago
DreamMr / RAP
View on GitHub
Code for Retrieval-Augmented Perception （ICML 2025)
☆74Apr 22, 2026Updated 3 months ago
niejiahao1998 / MMRel
View on GitHub
☆31Nov 17, 2024Updated last year
worldbench / SuperFlow
View on GitHub
[ECCV 2024] 4D Contrastive Superflows are Dense 3D Representation Learners
☆52Dec 4, 2025Updated 7 months ago
fhgyuanshen / HybridGL
View on GitHub
[CVPR 2025] Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
☆37Jun 27, 2025Updated last year
DripNowhy / ETA
View on GitHub
[ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"
☆34Jul 20, 2025Updated last year
Style3D / FashionR2R
View on GitHub
☆32Oct 23, 2024Updated last year
shikras / d-cube
View on GitHub
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating…
☆138Mar 20, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ylingfeng / FGVP
View on GitHub
Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023
☆57Feb 1, 2024Updated 2 years ago
BillChan226 / HALC
View on GitHub
[ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"
☆115Dec 4, 2024Updated last year
wrudman / NOTICE
View on GitHub
☆14Apr 10, 2025Updated last year
showlab / datacentric.vlp
View on GitHub
Compress conventional Vision-Language Pre-training data
☆52Sep 22, 2023Updated 2 years ago
zilunzhang / StreetCLIP-Repoduce
View on GitHub
☆13Jul 1, 2024Updated 2 years ago
naver-ai / lut
View on GitHub
[ECCV 2024] Official PyTorch implementation of LUT "Learning with Unmasked Tokens Drives Stronger Vision Learners"
☆14Dec 1, 2024Updated last year
zertow / TPNet
View on GitHub
☆13Oct 25, 2024Updated last year