EIT-NLP/Layer_Select_Fuse_for_MLLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/EIT-NLP/Layer_Select_Fuse_for_MLLM)

EIT-NLP / Layer_Select_Fuse_for_MLLM

[CVPR2025] Official implementation of the paper "Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices". (by Junyan Lin)

☆49

Alternatives and similar repositories for Layer_Select_Fuse_for_MLLM

Users that are interested in Layer_Select_Fuse_for_MLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

EIT-NLP / Connector-Selection-for-MLLM
View on GitHub
[EMNLP 2024 Main] Official implementation of the paper "To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimoda…
☆16Dec 13, 2024Updated last year
EIT-NLP / BLEUless_DocMT
View on GitHub
☆14Nov 19, 2024Updated last year
EIT-NLP / 2D-Coordinate-System-for-ICL
View on GitHub
[EMNLP 2024 Main] Official implementation of the paper "Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mech…
☆15Oct 8, 2024Updated last year
EIT-NLP / Distilling-CoT-Reasoning
View on GitHub
[ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".
☆22Feb 26, 2025Updated last year
phuselab / tppgaze
View on GitHub
☆17Feb 20, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
aimagelab / DiCO
View on GitHub
[BMVC 2024 Oral ✨] Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
☆20Sep 11, 2024Updated last year
VLR-CVC / vlm-training
View on GitHub
large scale pre-training VLMs
☆25Jul 6, 2026Updated 2 weeks ago
MrZilinXiao / AutoVER
View on GitHub
[ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.
☆14Mar 2, 2024Updated 2 years ago
facebookresearch / Llip
View on GitHub
Official PyTorch codebase for the Modeling Caption Diversity in ContrastiveVision-Language Pretraining paper.
☆19Mar 28, 2025Updated last year
thunlp / hyperbolic_llm
View on GitHub
☆12May 23, 2024Updated 2 years ago
cvlab-kaist / VIRAL
View on GitHub
Official implementation of "VIRAL: Visual Representation Alignment for MLLMs".
☆162Sep 21, 2025Updated 10 months ago
aimagelab / ScanDiff
View on GitHub
This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV …
☆27May 13, 2026Updated 2 months ago
Hxyz-123 / ReasoningOCR
View on GitHub
☆18Jul 24, 2025Updated last year
EIT-NLP / StreamingLLM
View on GitHub
Repository of Streaming LLMs
☆91Jun 20, 2026Updated last month
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
aimagelab / HySAC
View on GitHub
Hyperbolic Safety-Aware Vision-Language Models. CVPR 2025
☆31Apr 8, 2025Updated last year
LanqingL / SCS
View on GitHub
"Visual Prompt Selection for In-Context Learning Segmentation Framework"
☆14Dec 13, 2024Updated last year
yuecao0119 / MMFuser
View on GitHub
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆63Nov 5, 2024Updated last year
Qinyu-Allen-Zhao / LVLM-LP
View on GitHub
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
☆43Nov 1, 2024Updated last year
HL-hanlin / Bifrost-1
View on GitHub
Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)
☆47Nov 24, 2025Updated 8 months ago
mvrl / ConText-CIR
View on GitHub
[CVPR'25] ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval
☆16Jun 17, 2026Updated last month
aimagelab / MissRAG
View on GitHub
[ICCV 2025] MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models
☆26May 12, 2026Updated 2 months ago
SHI-Labs / Slow-Fast-Video-Multimodal-LLM
View on GitHub
☆29Apr 8, 2025Updated last year
WanyueZhang-ai / spatial-understanding
View on GitHub
☆20Sep 3, 2025Updated 10 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
aimagelab / ReT-2
View on GitHub
Recurrence Meets Transformers for Universal Multimodal Retrieval
☆15Dec 15, 2025Updated 7 months ago
cilinyan / ReVOS-api
View on GitHub
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
☆22Jul 20, 2024Updated 2 years ago
EIT-NLP / Awesome-Latent-CoT
View on GitHub
This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.
☆363Jun 20, 2026Updated last month
MengLcool / SliMM
View on GitHub
☆25Dec 26, 2024Updated last year
manoja328 / TallyQA_dataset
View on GitHub
TallyQA: Answering Complex Counting Questions dataset
☆31Feb 19, 2024Updated 2 years ago
VisionOPD / Vision-OPD
View on GitHub
Vision-OPD is a regional-to-global on-policy self-distillation framework that transfers a model's own privileged crop-conditioned percept…
☆201Jul 17, 2026Updated last week
YangYY-Liu / HybridCBM
View on GitHub
☆16Jun 14, 2025Updated last year
aimagelab / VHS
View on GitHub
[CVPR2026 Findings] VHS: Verifier on Hidden States, an efficient inference-time scaling verification framework for DiT-based image genera…
☆16Mar 25, 2026Updated 4 months ago
WangHanLinHenry / STeCa
View on GitHub
(ACL2025 Findings) Official code for the paper "STeCa: Step-level Trajectory Calibration for LLM Agent Learning"
☆29Mar 2, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ExplainableML / cosmos
View on GitHub
[CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
☆42Mar 27, 2025Updated last year
j-river / svtr-pytorch
View on GitHub
pytorch version of svtr model
☆27May 24, 2022Updated 4 years ago
summitgao / SS-MAE
View on GitHub
SS-MAE: Spatial-Spectral Masked Auto-Encoder for Mulit-Source Remote Sensing Image Classification (IEEE TGRS 2023)
☆59Mar 13, 2024Updated 2 years ago
ChocoWu / MRE-ISE
View on GitHub
About Codes for ACL 2023 paper: Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling.
☆22Jun 25, 2024Updated 2 years ago
ShawnHuang497 / MedPLIB
View on GitHub
The official repository of the paper 'Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine'
☆133Jul 7, 2026Updated 2 weeks ago
meituan / MemOCR
View on GitHub
MemOCR: an OCR-driven visual memory agent.
☆33May 17, 2026Updated 2 months ago
aimagelab / safe-clip
View on GitHub
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models. ECCV 2024
☆67Aug 10, 2024Updated last year