WenjunHuang94 / ML-Mamba
ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2
☆52Updated this week
Related projects ⓘ
Alternatives and complementary repositories for ML-Mamba
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆17Updated 3 weeks ago
- This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision☆64Updated 5 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆78Updated 8 months ago
- ☆105Updated 3 months ago
- ☆23Updated 5 months ago
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation☆56Updated 2 months ago
- [NeurIPS2024 Spotlight] The official implementation of GrootVL: Tree Topology is All You Need in State Space Model☆83Updated 5 months ago
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆35Updated 3 weeks ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆132Updated last month
- CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation☆65Updated 3 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆137Updated 2 weeks ago
- [CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆94Updated 8 months ago
- Improving Mamaba performance on Video Understanding task☆30Updated last month
- Official implementation of paper titled "GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model"☆61Updated 4 months ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆44Updated 3 weeks ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆23Updated last week
- ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆50Updated 6 months ago
- ☆54Updated 3 weeks ago
- Making LLaVA Tiny via MoE-Knowledge Distillation☆60Updated 3 weeks ago
- ☆22Updated 5 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆96Updated last week
- This repo contains the source code for VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks (NeurIPS 2024).☆24Updated last month
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆47Updated 6 months ago
- ☆20Updated 7 months ago
- Introduce Mamba2 to Vision.☆93Updated 2 months ago
- 【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification☆86Updated 3 weeks ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆26Updated last month
- 【NeurIPS 2024】Dense Connector for MLLMs☆140Updated last month
- ☆45Updated last week
- Official Pytorch Implementation of Self-emerging Token Labeling☆30Updated 7 months ago