WenjunHuang94 / ML-Mamba
ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2
☆63Updated 4 months ago
Alternatives and similar repositories for ML-Mamba:
Users that are interested in ML-Mamba are comparing it to the libraries listed below
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆81Updated last year
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆51Updated 5 months ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆57Updated 4 months ago
- This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision☆73Updated 10 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆34Updated this week
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆19Updated 5 months ago
- Code for CVPR2025 "MMRL: Multi-Modal Representation Learning for Vision-Language Models".☆27Updated last week
- Pytorch Implementation for CVPR 2024 paper: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation☆37Updated last month
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆20Updated 8 months ago
- Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch☆35Updated 5 months ago
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…☆28Updated last week
- Official implementation of NeurIPS 2024 "Visual Fourier Prompt Tuning"☆26Updated 3 months ago
- [CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention☆13Updated last month
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆23Updated 6 months ago
- ☆64Updated last month
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆18Updated 2 months ago
- [NeurIPS 2024] official code release for our paper "Revisiting the Integration of Convolution and Attention for Vision Backbone".☆37Updated 2 months ago
- CLIP-MoE: Mixture of Experts for CLIP☆31Updated 6 months ago
- ☆19Updated 4 months ago
- GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model [CVPR -2025]☆89Updated 3 weeks ago
- CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation☆70Updated 8 months ago
- [ECCV 2024] Soft Prompt Generation for Domain Generalization☆20Updated 6 months ago
- [CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"☆32Updated last week
- Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models☆76Updated last week
- The official code for "TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning" | [AAAI2025]☆34Updated last month
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding