kyegomez / LIMoELinks

Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts"

☆31

Alternatives and similar repositories for LIMoE

Users that are interested in LIMoE are comparing it to the libraries listed below

Sorting:

SHI-Labs / OLA-VLM
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024
☆60Updated 4 months ago
kyegomez / MC-ViT
Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"
☆20Updated 3 months ago
sunsmarterjie / ChatterBox
[AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues
☆55Updated 2 months ago
MCR-PEFT / Ex-MCR
☆45Updated last month
NVlabs / STL
Official Pytorch Implementation of Self-emerging Token Labeling
☆34Updated last year
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆56Updated 8 months ago
WenjunHuang94 / ML-Mamba
ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2
☆66Updated 7 months ago
AIFEG / BenchLMM
[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
☆85Updated 10 months ago
m1k2zoo / negbench
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆27Updated 2 months ago
mlvlab / RALF
Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".
☆41Updated 10 months ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆163Updated 9 months ago
JiuTian-VL / MoME
[NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
☆68Updated 2 months ago
FudanNLPLAB / MouSi
☆73Updated last year
layer6ai-labs / fusemix
Data-Efficient Multimodal Fusion on a Single GPU
☆66Updated last year
ByungKwanLee / Phantom
[Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…
☆60Updated 9 months ago
OpenSparseLLMs / CLIP-MoE
CLIP-MoE: Mixture of Experts for CLIP
☆42Updated 9 months ago
kyegomez / Mirasol
Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"
☆26Updated 5 months ago
alibaba / conv-llava
☆115Updated 11 months ago
FreedomIntelligence / LongLLaVA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆206Updated 6 months ago
microsoft / A-CLIP
Official Implementation of Attentive Mask CLIP (ICCV2023, https://arxiv.org/abs/2212.08653)
☆32Updated last year
kyegomez / BRAVE-ViT-Swarm
Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"
☆26Updated this week
isekai-portal / Link-Context-Learning
☆98Updated last year
THUDM / Awesome-Parameter-Efficient-Fine-Tuning-for-Foundation-Models
Parameter-Efficient Fine-Tuning for Foundation Models
☆75Updated 3 months ago
FactoDeepLearning / MultitaskVLFM
☆26Updated last year
OpenGVLab / Multitask-Model-Selector
[NIPS2023]Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector
☆37Updated last year
yfzhang114 / SliME
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
☆159Updated 6 months ago
om-ai-lab / ZoomEye
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆46Updated 6 months ago
AILab-CVC / M2PT
[CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
☆99Updated last year
iancovert / locality-alignment
☆50Updated 6 months ago
XiaoduoAILab / XmodelVLM
☆68Updated last year