rhymes-ai/Aria

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/rhymes-ai/Aria)

rhymes-ai / Aria

Codebase for Aria - an Open Multimodal Native MoE

☆1,088

Alternatives and similar repositories for Aria

Users that are interested in Aria are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rhymes-ai / Allegro
View on GitHub
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple te…
☆1,134Feb 7, 2025Updated last year
EvolvingLMMs-Lab / LongVA
View on GitHub
Long Context Transfer from Language to Vision
☆406Mar 18, 2025Updated last year
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,699Jun 15, 2026Updated 3 weeks ago
longvideobench / LongVideoBench
View on GitHub
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆131Jul 27, 2024Updated last year
facebookresearch / chameleon
View on GitHub
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
☆2,102Jul 29, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,005Nov 7, 2025Updated 8 months ago
NVlabs / VILA
View on GitHub
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…
☆3,828Mar 12, 2026Updated 3 months ago
baaivision / Emu3
View on GitHub
Next-Token Prediction is All You Need
☆2,426Jan 12, 2026Updated 5 months ago
InternLM / InternLM-XComposer
View on GitHub
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
☆2,922May 26, 2025Updated last year
ATH-MaaS / Ovis
View on GitHub
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
☆1,452Feb 11, 2026Updated 4 months ago
magic-research / PLLaVA
View on GitHub
Official repository for the paper PLLaVA
☆669Jul 28, 2024Updated last year
baaivision / DenseFusion
View on GitHub
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year
PKU-YuanGroup / LLaVA-CoT
View on GitHub
[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
☆2,136Dec 12, 2025Updated 6 months ago
mlfoundations / MINT-1T
View on GitHub
🍃 MINT-1T: A one trillion token multimodal interleaved dataset.
☆832Jul 31, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
apple / ml-aim
View on GitHub
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
☆1,425Aug 4, 2025Updated 11 months ago
StarsfieldAI / R1-V
View on GitHub
Witness the aha moment of VLM with less than $3.
☆4,061May 19, 2025Updated last year
FreedomIntelligence / LongLLaVA
View on GitHub
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆211Jan 6, 2025Updated last year
EvolvingLMMs-Lab / open-r1-multimodal
View on GitHub
A fork to add multimodal model training to open-r1
☆1,583Feb 8, 2025Updated last year
Vision-CAIR / LongVU
View on GitHub
[ICML 2025] Official PyTorch implementation of LongVU
☆428May 8, 2025Updated last year
NVlabs / Eagle
View on GitHub
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
☆3,071Jun 24, 2026Updated 2 weeks ago
kijai / ComfyUI-MochiWrapper
View on GitHub
☆795Nov 11, 2024Updated last year
PKU-YuanGroup / MoE-LLaVA
View on GitHub
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
☆2,322Jul 15, 2025Updated 11 months ago
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,552Jan 30, 2026Updated 5 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
yale-nlp / TOMATO
View on GitHub
☆41Nov 8, 2024Updated last year
facebookresearch / MetaCLIP
View on GitHub
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
☆1,841Nov 27, 2025Updated 7 months ago
OpenGVLab / MM-NIAH
View on GitHub
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆126Nov 25, 2024Updated last year
microsoft / LLM2CLIP
View on GitHub
LLM2CLIP significantly improves already state-of-the-art CLIP models.
☆672Feb 1, 2026Updated 5 months ago
baaivision / Emu
View on GitHub
Emu Series: Generative Multimodal Models from BAAI
☆1,776Jan 12, 2026Updated 5 months ago
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated 2 years ago
DAMO-NLP-SG / VideoLLaMA2
View on GitHub
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
☆1,303Jan 23, 2025Updated last year
open-compass / VLMEvalKit
View on GitHub
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆4,253Updated this week
VectorSpaceLab / OmniGen
View on GitHub
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
☆4,328Dec 4, 2025Updated 7 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
Oryx-mllm / Oryx
View on GitHub
[ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
☆329Jul 4, 2025Updated last year
DAMO-NLP-SG / multimodal_textbook
View on GitHub
[ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"
☆196Mar 17, 2025Updated last year
google / imageinwords
View on GitHub
Data release for the ImageInWords (IIW) paper.
☆224Nov 17, 2024Updated last year
thunlp / LLaVA-UHD
View on GitHub
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
☆423Updated this week
OpenGVLab / InternVL
View on GitHub
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
☆10,088Sep 22, 2025Updated 9 months ago
EvolvingLMMs-Lab / lmms-eval
View on GitHub
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
☆4,279Updated this week
genmoai / mochi
View on GitHub
The best OSS video generation models, created by Genmo
☆3,689Nov 14, 2025Updated 7 months ago