kyegomez / LIMoE
Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts"
☆22Updated this week
Related projects ⓘ
Alternatives and complementary repositories for LIMoE
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆16Updated this week
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…☆11Updated 9 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆23Updated this week
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆24Updated this week
- ☆72Updated 8 months ago
- ☆103Updated 3 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆137Updated this week
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆33Updated 11 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆30Updated 7 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated 10 months ago
- ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆50Updated 6 months ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆133Updated 3 weeks ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆37Updated 6 months ago
- ☆37Updated last year
- Official repo for StableLLAVA☆90Updated 10 months ago
- Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆113Updated last month
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆130Updated last month
- ☆56Updated 9 months ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆58Updated 10 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆144Updated this week
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆98Updated 2 weeks ago
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆48Updated last month
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆33Updated 2 months ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆81Updated 4 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆38Updated 4 months ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆41Updated last week
- Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"☆21Updated this week
- [EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner☆110Updated this week
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆82Updated 4 months ago
- ☆21Updated 2 months ago