NVlabs / EAGLE
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
☆404Updated this week
Related projects: ⓘ
- ☆356Updated 4 months ago
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization☆537Updated 3 months ago
- An open-source implementation for training LLaVA-NeXT.☆243Updated 3 months ago
- ☆347Updated 3 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆275Updated 2 months ago
- LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images☆298Updated last month
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks☆356Updated 2 months ago
- When do we not need larger vision models?☆314Updated last month
- Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of num…☆303Updated 5 months ago
- [CVPR 2023] Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners☆343Updated last year
- Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference☆240Updated last month
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"☆502Updated 8 months ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆692Updated 7 months ago
- OmniTokenizer: one model and one weight for image-video joint tokenization.☆228Updated 2 months ago
- Official repository for the paper PLLaVA☆551Updated last month
- Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding☆188Updated last month
- Long Context Transfer from Language to Vision☆293Updated 3 weeks ago
- LLaVA-HR: High-Resolution Large Language-Vision Assistant☆202Updated last month
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …☆444Updated last month
- [ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?☆133Updated 2 weeks ago
- Accelerating the development of large multimodal models (LMMs) with lmms-eval☆1,349Updated this week
- ☆557Updated 7 months ago
- LLaVA-Interactive-Demo☆344Updated last month
- PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models☆235Updated 8 months ago
- (AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions☆260Updated 5 months ago
- Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning☆208Updated last week
- Famous Vision Language Models and Their Architectures☆295Updated last week
- VCoder: Versatile Vision Encoders for Multimodal Large Language Models, arXiv 2023 / CVPR 2024☆255Updated 5 months ago
- ☆246Updated this week
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆742Updated 3 months ago