[CVPR2025] Official implementation of the paper "Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices". (by Junyan Lin)
☆44Oct 29, 2025Updated 4 months ago
Alternatives and similar repositories for Layer_Select_Fuse_for_MLLM
Users that are interested in Layer_Select_Fuse_for_MLLM are comparing it to the libraries listed below
Sorting:
- [ICML 2025] Official implementation of the paper "SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling". …☆20Nov 17, 2025Updated 3 months ago
- "Visual Prompt Selection for In-Context Learning Segmentation Framework"☆15Dec 13, 2024Updated last year
- Official PyTorch codebase for the Modeling Caption Diversity in ContrastiveVision-Language Pretraining paper.☆18Mar 28, 2025Updated 11 months ago
- [ICLR 2026] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning☆40Feb 22, 2026Updated last week
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆19Jul 20, 2024Updated last year
- Official Code for the ICCV23 Paper: "LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval…☆40Oct 14, 2023Updated 2 years ago
- [CVPR25] CoLLM: A Large Language Model for Composed Image Retrieval☆28Mar 26, 2025Updated 11 months ago
- Recent Advances in Visual Dialog☆30Aug 19, 2022Updated 3 years ago
- [CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories☆92Aug 8, 2025Updated 6 months ago
- [CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training☆38Mar 27, 2025Updated 11 months ago
- This repo is the official pytorch implementation of the paper: CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-V…☆40Sep 10, 2025Updated 5 months ago
- The official repository of the paper 'Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine'☆120Jan 9, 2025Updated last year
- This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV …☆24Dec 4, 2025Updated 3 months ago
- source code for ICCV2021 paper "MFNet: Multi-filter Directive Network for Weakly Supervised Salient Object Detection"☆11Jul 17, 2022Updated 3 years ago
- Vision Transformer (ViT) models, with their attention mechanisms, revolutionized computer vision. By merging Class Activation Map (CAM) a…☆13Aug 14, 2023Updated 2 years ago
- [NeurIPS'24] Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation (Diffews)☆48Apr 14, 2025Updated 10 months ago
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation☆47Oct 3, 2024Updated last year
- dMel: Speech Tokenization Made Simple☆16May 13, 2025Updated 9 months ago
- Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector☆11Jun 24, 2023Updated 2 years ago
- Reward Evolution with Large Language Models using Human Feedback☆18Nov 14, 2025Updated 3 months ago
- Spectral Graph Attention Network with Fast Eigen-approximation☆12Dec 24, 2021Updated 4 years ago
- Official Implementation for paper "Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm"☆20Feb 20, 2026Updated last week
- Official implementation of "NoiseAR: AutoRegressing Initial Noise Prior for Diffusion Models"☆18Jun 3, 2025Updated 9 months ago
- Interpreting CLIP with Hierarchical Sparse Autoencoders (ICML 2025)☆20Jan 17, 2026Updated last month
- A simply deep learning based blur image detector.☆10Mar 29, 2023Updated 2 years ago
- Code for "HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation" CVPR2024☆10Apr 19, 2024Updated last year
- ☆14May 20, 2025Updated 9 months ago
- Pytorch implementation of 'Improving Self-supervised Lightweight Model Learning via Hard-aware Metric Distillation. In ECCV 2022'☆12Mar 22, 2023Updated 2 years ago
- ☆12Mar 5, 2024Updated 2 years ago
- THE VISUAL COMPUTER “High-level LoRA and hierarchical fusion for enhanced micro-expression recognition”☆13Oct 12, 2024Updated last year
- Tools for replaying Drake simulations in Blender☆12Updated this week
- ☆12Aug 19, 2023Updated 2 years ago
- ☆24Oct 31, 2025Updated 4 months ago
- [ICML 2022 Spotlight] Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks☆11May 21, 2023Updated 2 years ago
- [🔥ACM MM2025] EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation☆23Dec 30, 2025Updated 2 months ago
- [AAAI 2023] Official implementation of FiTs: Fine-grained Two-stage Training for Knowledge Base Question Answering☆11Mar 10, 2023Updated 2 years ago
- ☆12May 23, 2024Updated last year
- ☆13Jul 28, 2024Updated last year
- [ICLR 2025] This repo is the official implementation of "The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs".☆13Jan 25, 2025Updated last year