Ekoda / SoftMoELinks
Soft Mixture of Experts Vision Transformer, addressing MoE limitations as highlighted by Puigcerver et al., 2023.
☆15Updated 2 years ago
Alternatives and similar repositories for SoftMoE
Users that are interested in SoftMoE are comparing it to the libraries listed below
Sorting:
- Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main confer…☆27Updated last year
- ☆93Updated 2 years ago
- The official repo for [TPAMI'23] "Vision Transformer with Quadrangle Attention"☆234Updated 4 months ago
- Awesome List of Vision Language Prompt Papers☆46Updated 2 years ago
- [CVPR-2024] Official implementations of CLIP-KD: An Empirical Study of CLIP Model Distillation☆143Updated 5 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆86Updated last year
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆89Updated 2 years ago
- ImageNet-1K data download, processing for using as a dataset☆125Updated 3 years ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆145Updated last year
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆98Updated last year
- [CVPR 2024] Offical implemention of the paper "DePT: Decoupled Prompt Tuning"☆109Updated 2 months ago
- [AAAI 2024] TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training☆106Updated 2 years ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆72Updated 2 years ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆173Updated last year
- Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"☆259Updated last year
- [CVPR'23 & TPAMI'25] Hard Patches Mining for Masked Image Modeling & Bootstrap Masked Visual Modeling via Hard Patch Mining☆107Updated 9 months ago
- ☆124Updated last year
- ☆92Updated 2 years ago
- ☆267Updated 3 years ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆65Updated last year
- 🔥MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer [Official, ICLR 2023]☆22Updated 2 years ago
- PyTorch implementation of LIMoE☆52Updated last year
- The official implementation of paper: "Inter-Instance Similarity Modeling for Contrastive Learning"☆117Updated last year
- 【NeurIPS 2024】Dense Connector for MLLMs☆180Updated last year
- Two-way Multi-Label Loss☆35Updated 2 years ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆336Updated last year
- [CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for C…☆278Updated last year
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆33Updated 6 months ago
- [ICCV 2023] CLR: Channel-wise Lightweight Reprogramming for Continual Learning☆33Updated last year
- Code for paper "Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters" CVPR2024☆269Updated 4 months ago