yuecao0119/MMFuser

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yuecao0119/MMFuser)

yuecao0119 / MMFuser

The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". MMFuser addresses the limitations of current MLLMs in capturing complex image details by simply yet efficiently integrating multi-layer features from ViTs.

☆63

Alternatives and similar repositories for MMFuser

Users that are interested in MMFuser are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yuecao0119 / MMInstruct
View on GitHub
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆64Nov 7, 2024Updated last year
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
locuslab / llava-token-compression
View on GitHub
☆47Nov 8, 2024Updated last year
OpenGVLab / SDLM
View on GitHub
Sequential Diffusion Language Model (SDLM) enhances pre-trained autoregressive language models by adaptively determining generation lengt…
☆98Dec 27, 2025Updated 7 months ago
Hon-Wong / ByteVideoLLM
View on GitHub
[ICCV 2025] Dynamic-VLM
☆28Dec 16, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
CG-Bench / CG-Bench
View on GitHub
☆20Jan 26, 2025Updated last year
Echo0125 / MAT-Memory-and-Anticipation-Transformer
View on GitHub
[ICCV 2023] Official implementation of Memory-and-Anticipation Transformer for Online Action Understanding
☆50Oct 7, 2023Updated 2 years ago
JieShibo / MemVP
View on GitHub
[ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
☆49May 12, 2024Updated 2 years ago
Sharpiless / Pix2seq-mmdetection
View on GitHub
Unofficial implement of "Pix2seq: A Language Modeling Framework for Object Detection" on mmdetection
☆34Apr 18, 2022Updated 4 years ago
guanxiongsun / vfe.pytorch
View on GitHub
Video Feature Enhancement with PyTorch
☆32Nov 28, 2024Updated last year
ludc506 / InternVL-X
View on GitHub
☆16Mar 26, 2025Updated last year
wanglu-cs / Think_While_Watching
View on GitHub
☆19Jun 26, 2026Updated last month
Rubics-Xuan / IVG
View on GitHub
This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…
☆15May 21, 2024Updated 2 years ago
gregor-ge / mBLIP
View on GitHub
☆88Jan 10, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
brown-palm / AntGPT
View on GitHub
Official code implemtation of paper AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
☆31Sep 23, 2024Updated last year
TideDra / VL-RLHF
View on GitHub
A RLHF Infrastructure for Vision-Language Models
☆201Nov 15, 2024Updated last year
TIMMY-CHAN / MILE
View on GitHub
[MICCAI 2024] Can LLMs' Tuning Methods Work in Medical Multimodal Domain?
☆17Sep 18, 2024Updated last year
Hui-design / R1-Video-fixbug
View on GitHub
[Blog 1] Recording a bug of grpo_trainer in some R1 projects
☆23Feb 23, 2025Updated last year
shubhamgpt007 / CrisisKAN
View on GitHub
Code repository of paper "CrisisKAN: Knowledge-infused and Explainable Multimodal Attention Network for Crisis Event Classification" publ…
☆12Jul 15, 2025Updated last year
EIT-NLP / Layer_Select_Fuse_for_MLLM
View on GitHub
[CVPR2025] Official implementation of the paper "Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practi…
☆48Oct 29, 2025Updated 9 months ago
aiha-lab / InfiniPot-V
View on GitHub
[NeurIPS 25] InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
☆20Jan 25, 2026Updated 6 months ago
zsgvivo / VideoZoomer
View on GitHub
☆34Feb 12, 2026Updated 5 months ago
PQ3D / PQ3D
View on GitHub
Official implementation of the paper "Unifying 3D Vision-Language Understanding via Promptable Queries"
☆86Aug 2, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
yu-rp / VisualPerceptionToken
View on GitHub
☆136Mar 22, 2025Updated last year
hulianyuyy / iLLaVA
View on GitHub
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models (ICLR2026)
☆23Jun 24, 2026Updated last month
MCG-NJU / p-MoD
View on GitHub
[ICCV 2025] p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
☆44Jun 26, 2025Updated last year
zehuichen123 / NoiseDet
View on GitHub
[ICCV2023] NoiseDet: Learning from Noisy Data for Semi-Superivsed 3D Object Detection
☆20Feb 5, 2023Updated 3 years ago
Ziyang412 / Video-RTS
View on GitHub
Code for EMNLP25 paper "Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning"
☆24Feb 18, 2026Updated 5 months ago
ahmedssabir / Belief-Revision-Score
View on GitHub
Belief Revision based Caption Re-ranker with Visual Semantic Information. COLING 2022
☆11Apr 13, 2025Updated last year
ChangyaoTian / ADDP
View on GitHub
The official implementation of ADDP (ICLR 2024)
☆12Mar 27, 2024Updated 2 years ago
luka-group / mDPO
View on GitHub
[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.
☆88Nov 10, 2024Updated last year
alenai97 / PEFT-MLLM
View on GitHub
Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"
☆25Nov 10, 2024Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
Yxxxb / VoCo-LLaMA
View on GitHub
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆205Jun 18, 2025Updated last year
baaivision / EVE
View on GitHub
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆376Jul 24, 2025Updated last year
jiangsongtao / Med-MoE
View on GitHub
[EMNLP'24] Code and data for paper "Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models"
☆158Jul 7, 2025Updated last year
haonanwang0522 / GTPT
View on GitHub
[ECCV 2024] GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation
☆19Oct 5, 2024Updated last year
zhiqi-li / WechatLogger
View on GitHub
一个mmcv 的logger hook, 可以用来把模型结果推送到微信上
☆21Oct 11, 2022Updated 3 years ago
ayiyayi / EgoExoBench
View on GitHub
☆15Nov 13, 2025Updated 8 months ago
OpenGVLab / V2PE
View on GitHub
[ICCV2025] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆60Apr 4, 2026Updated 3 months ago