WenjunHuang94 / ML-Mamba
ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2
☆51Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for ML-Mamba
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆78Updated 7 months ago
- CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation☆66Updated 2 months ago
- This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision☆63Updated 4 months ago
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆35Updated 2 weeks ago
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation☆56Updated 2 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆137Updated this week
- ☆103Updated 3 months ago
- ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆50Updated 6 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆46Updated last week
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆89Updated last month
- Making LLaVA Tiny via MoE-Knowledge Distillation☆55Updated 2 weeks ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆137Updated 3 weeks ago
- A Survey on Benchmarks of Multimodal Large Language Models☆59Updated last month
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆132Updated last month
- Official Pytorch Implementation of Self-emerging Token Labeling☆30Updated 7 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆48Updated last month
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆58Updated 10 months ago
- An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…☆71Updated 8 months ago
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆47Updated 6 months ago
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆56Updated 2 weeks ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆98Updated 6 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆26Updated last month
- Official implementation of paper titled "GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model"☆60Updated 3 months ago
- [ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models☆81Updated 2 months ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆25Updated 3 weeks ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆81Updated 4 months ago
- The official implementation of RAR☆72Updated 7 months ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆42Updated last week
- ☆48Updated 4 months ago
- Visual self-questioning for large vision-language assistant.☆31Updated last month