[CVPR2025] Official implementation of the paper "Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices". (by Junyan Lin)
☆48Oct 29, 2025Updated 6 months ago
Alternatives and similar repositories for Layer_Select_Fuse_for_MLLM
Users that are interested in Layer_Select_Fuse_for_MLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [EMNLP 2024 Main] Official implementation of the paper "To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimoda…☆17Dec 13, 2024Updated last year
- ☆14Nov 19, 2024Updated last year
- [ICML 2025] Official implementation of the paper "SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling". …☆22Nov 17, 2025Updated 5 months ago
- Official PyTorch codebase for the Modeling Caption Diversity in ContrastiveVision-Language Pretraining paper.☆18Mar 28, 2025Updated last year
- [ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".☆22Feb 26, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- (ACL2025 Findings) Official code for the paper "STeCa: Step-level Trajectory Calibration for LLM Agent Learning"☆26Mar 2, 2026Updated 2 months ago
- "Visual Prompt Selection for In-Context Learning Segmentation Framework"☆15Dec 13, 2024Updated last year
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation☆48Oct 3, 2024Updated last year
- [CVPR25] CoLLM: A Large Language Model for Composed Image Retrieval☆29Mar 26, 2025Updated last year
- RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering☆10Nov 27, 2022Updated 3 years ago
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆322Apr 2, 2026Updated last month
- ViDRiP-LLaVA: A Dataset and Benchmark for Diagnostic Reasoning from Pathology Videos☆23May 21, 2025Updated 11 months ago
- pytorch version of svtr model☆27May 24, 2022Updated 3 years ago
- [CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training☆41Mar 27, 2025Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆33Oct 21, 2025Updated 6 months ago
- ☆28Jul 30, 2024Updated last year
- ☆29Dec 5, 2021Updated 4 years ago
- This repo is the official pytorch implementation of the paper: CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-V…☆42Sep 10, 2025Updated 7 months ago
- [CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories☆100Aug 8, 2025Updated 8 months ago
- [TPAMI 2023] Object Affinity Learning: Towards Annotation-free Instance Segmentation☆14Sep 14, 2023Updated 2 years ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆64Nov 5, 2024Updated last year
- Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"☆79Sep 13, 2025Updated 7 months ago
- ☆25Jun 5, 2025Updated 11 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [BMVC 2024 Oral ✨] Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization☆20Sep 11, 2024Updated last year
- Retrieval_OOD_for_Multimodal_AI☆11Dec 4, 2024Updated last year
- Official Implementation for paper "Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm"☆23Mar 18, 2026Updated last month
- ☆14Jun 10, 2025Updated 10 months ago
- PyTorch implementation of EfficientNet-lite and a spectrum of pre-trained models on ImageNet☆11Mar 20, 2020Updated 6 years ago
- MuJoCo benchmark for Deep Reinforcement Learning as provided by Tianshou framework.☆15Jan 12, 2025Updated last year
- ☆115Oct 21, 2025Updated 6 months ago
- ☆12Jun 18, 2024Updated last year
- Hyperbolic Safety-Aware Vision-Language Models. CVPR 2025☆30Apr 8, 2025Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Spectral Graph Attention Network with Fast Eigen-approximation☆11Dec 24, 2021Updated 4 years ago
- [IEEE T-PAMI 2023] Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering☆20Jul 6, 2023Updated 2 years ago
- [ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"☆55Feb 10, 2025Updated last year
- Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation (NeurIPS 23)☆12May 7, 2025Updated 11 months ago
- 使用DDPG算法解决rlschool中无 人机悬停控制的问题(内含训练了9个小时的良模型)☆10Jul 7, 2020Updated 5 years ago
- 비디오 기반 인공지능 대화시스템☆11Aug 16, 2023Updated 2 years ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆145Dec 26, 2024Updated last year