NVlabs / EAGLELinks
Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs
☆803Updated last month
Alternatives and similar repositories for EAGLE
Users that are interested in EAGLE are comparing it to the libraries listed below
Sorting:
- An open-source implementation for training LLaVA-NeXT.☆398Updated 8 months ago
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization☆568Updated last year
- ☆387Updated 6 months ago
- [ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution☆312Updated 3 months ago
- [ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.☆1,480Updated this week
- Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.☆2,635Updated this week
- 🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos☆1,150Updated 3 weeks ago
- Liquid: Language Models are Scalable and Unified Multi-modal Generators☆592Updated 2 months ago
- A family of lightweight multimodal models.☆1,022Updated 7 months ago
- [CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation☆724Updated last month
- Code for the Molmo Vision-Language Model☆506Updated 6 months ago
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer☆380Updated 2 months ago
- [NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions☆1,063Updated 8 months ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆526Updated 2 months ago
- Ola: Pushing the Frontiers of Omni-Modal Language Model☆341Updated last week
- ✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis☆569Updated last month
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆569Updated 3 weeks ago
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆302Updated last month
- Long Context Transfer from Language to Vision☆381Updated 3 months ago
- ☆233Updated 6 months ago
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025☆252Updated 3 weeks ago
- [ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…☆438Updated 5 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆322Updated 11 months ago
- (AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions☆260Updated last year
- a family of versatile and state-of-the-art video tokenizers.☆397Updated 2 months ago
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆301Updated last week
- [CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception☆570Updated last year
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆893Updated 2 months ago
- Explore the Multimodal “Aha Moment” on 2B Model☆592Updated 3 months ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆331Updated 6 months ago