ZhengYu518 / VL-MambaView external linksLinks
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
☆86Mar 21, 2024Updated last year
Alternatives and similar repositories for VL-Mamba
Users that are interested in VL-Mamba are comparing it to the libraries listed below
Sorting:
- [AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference☆293Jan 8, 2025Updated last year
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆50May 12, 2024Updated last year
- ☆33Apr 11, 2025Updated 10 months ago
- ☆37Sep 16, 2024Updated last year
- Code for paper LocalMamba: Visual State Space Model with Windowed Selective Scan☆275May 6, 2024Updated last year
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆424Dec 22, 2024Updated last year
- [ICASSP'25] Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues☆17Dec 31, 2024Updated last year
- [IEEE TCSVT] Vivim: a Video Vision Mamba for Medical Video Segmentation☆185Jun 12, 2025Updated 8 months ago
- [NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…☆116May 30, 2024Updated last year
- [ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding☆1,076Jul 6, 2024Updated last year
- ☆17Mar 30, 2024Updated last year
- A repository for DenseSSMs☆89Apr 11, 2024Updated last year
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆213Jan 6, 2025Updated last year
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆105Nov 28, 2024Updated last year
- LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs☆413Dec 20, 2025Updated last month
- [BMVC 2024] PlainMamba: Improving Non-hierarchical Mamba in Visual Recognition☆86Apr 6, 2025Updated 10 months ago
- Official code for paper "Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models, ICML2024"☆27Feb 2, 2025Updated last year
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆35Jul 1, 2024Updated last year
- Simba☆217Mar 24, 2024Updated last year
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Apr 10, 2025Updated 10 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆109May 27, 2025Updated 8 months ago
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆39Mar 16, 2025Updated 10 months ago
- Awesome Papers related to Mamba.☆1,389Oct 17, 2024Updated last year
- [ICLR2025] This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision☆90May 30, 2025Updated 8 months ago
- [NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆39Oct 14, 2025Updated 4 months ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Aug 5, 2024Updated last year
- Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model☆13Feb 15, 2024Updated 2 years ago
- ☆50Jan 28, 2025Updated last year
- Modality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models☆48Updated this week
- Dataset and baseline for Scenario Oriented Object Navigation (SOON)☆22Nov 23, 2021Updated 4 years ago
- [NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training☆26Dec 5, 2023Updated 2 years ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆173Sep 25, 2024Updated last year
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆233Nov 7, 2025Updated 3 months ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆477Jan 17, 2025Updated last year
- Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning☆79May 17, 2025Updated 8 months ago
- ☆27Apr 11, 2025Updated 10 months ago
- ☆33Nov 18, 2025Updated 2 months ago
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆21Updated this week
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year