WenjunHuang94 / ML-Mamba
ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2
☆64Updated 5 months ago
Alternatives and similar repositories for ML-Mamba:
Users that are interested in ML-Mamba are comparing it to the libraries listed below
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆81Updated last year
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆52Updated 6 months ago
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…☆29Updated last month
- This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision☆77Updated 10 months ago
- ☆43Updated 2 weeks ago
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆20Updated 9 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆24Updated 6 months ago
- Pytorch Implementation for CVPR 2024 paper: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation☆42Updated 2 weeks ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆17Updated 3 weeks ago
- [CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention☆17Updated last month
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆20Updated 6 months ago
- Official implementation of NeurIPS 2024 "Visual Fourier Prompt Tuning"☆26Updated 3 months ago
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆49Updated last month
- CLIP-MoE: Mixture of Experts for CLIP☆32Updated 6 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆57Updated 2 weeks ago
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models☆23Updated 2 months ago
- Official repository of Polarity-aware Linear Attention for Vision Transformers (ICLR 2025)☆60Updated 2 months ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆59Updated this week
- ☆36Updated 9 months ago
- ☆72Updated 6 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆31Updated 4 months ago
- ☆66Updated 2 months ago
- [NeurIPS 2024] official code release for our paper "Revisiting the Integration of Convolution and Attention for Vision Backbone".☆39Updated 3 months ago
- ☆18Updated 5 months ago
- Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch☆35Updated 5 months ago
- [ECCV 2024] Soft Prompt Generation for Domain Generalization☆22Updated 7 months ago
- An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…☆86Updated last year
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆188Updated last month
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆164Updated 3 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆18Updated 3 months ago