NVlabs / EAGLELinks

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

☆876

Alternatives and similar repositories for EAGLE

Users that are interested in EAGLE are comparing it to the libraries listed below

Sorting:

xmoanvaf / llava-phi
☆399Updated 9 months ago
xiaoachen98 / Open-LLaVA-NeXT
An open-source implementation for training LLaVA-NeXT.
☆422Updated 11 months ago
FoundationVision / Groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
☆577Updated last year
Oryx-mllm / Oryx
[ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
☆326Updated 3 months ago
bytedance / Sa2VA
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
☆1,290Updated 3 weeks ago
FoundationVision / Liquid
Liquid: Language Models are Scalable and Unified Multi-modal Generators
☆616Updated 5 months ago
mbzuai-oryx / LlamaV-o1
[ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs
☆306Updated 4 months ago
microsoft / LLM2CLIP
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
☆551Updated 3 months ago
showlab / Show-o
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,721Updated this week
allenai / molmo
Code for the Molmo Vision-Language Model
☆761Updated 9 months ago
ZiyuGuo99 / Image-Generation-CoT
[CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation
☆808Updated 4 months ago
BAAI-DCAI / Bunny
A family of lightweight multimodal models.
☆1,045Updated 10 months ago
CircleRadon / TokenPacker
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
☆269Updated 4 months ago
tulerfeng / Video-R1
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆707Updated 2 weeks ago
thunlp / LLaVA-UHD
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
☆388Updated 5 months ago
DCDmllm / Cheetah
☆342Updated last year
DAMO-NLP-SG / VideoRefer
[CVPR 2025] The code for "VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM"
☆268Updated last month
dvlab-research / VisionThink
[NeurIPS 2025] Efficient Reasoning Vision Language Models
☆395Updated 2 weeks ago
bytedance / tarsier
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆483Updated last month
turningpoint-ai / VisualThinker-R1-Zero
Explore the Multimodal “Aha Moment” on 2B Model
☆609Updated 6 months ago
ModalMinds / MM-EUREKA
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆736Updated 3 weeks ago
OpenGVLab / VisionLLM
VisionLLM Series
☆1,108Updated 7 months ago
shenyunhang / APE
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
☆592Updated last year
MME-Benchmarks / Video-MME
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
☆654Updated last month
UCSC-VLAA / OpenVision
[ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
☆389Updated 3 weeks ago
Vision-CAIR / LongVU
[ICML 2025] Official PyTorch implementation of LongVU
☆398Updated 4 months ago
pkunlp-icler / FastV
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…
☆491Updated 9 months ago
EvolvingLMMs-Lab / LLaVA-OneVision-1.5
Fully Open Framework for Democratized Multimodal Training
☆380Updated this week
Ola-Omni / Ola
Ola: Pushing the Frontiers of Omni-Modal Language Model
☆371Updated 3 months ago
JiuhaiChen / CVPR2025-Florence-VL
☆241Updated 9 months ago