rhymes-ai / Aria
Codebase for Aria - an Open Multimodal Native MoE
☆978Updated last week
Alternatives and similar repositories for Aria:
Users that are interested in Aria are comparing it to the libraries listed below
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation☆715Updated 5 months ago
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆599Updated 2 months ago
- An Open Large Reasoning Model for Real-World Solutions☆1,410Updated 2 months ago
- Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"☆821Updated 5 months ago
- ☆1,150Updated 2 months ago
- OLMoE: Open Mixture-of-Experts Language Models☆536Updated last month
- Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.☆875Updated this week
- Next-Token Prediction is All You Need☆1,976Updated 3 months ago
- ☆1,973Updated last week
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,913Updated 6 months ago
- An open-sourced end-to-end VLM-based GUI Agent☆628Updated last week
- ☆3,316Updated 3 months ago
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models☆1,126Updated last year
- MINT-1T: A one trillion token multimodal interleaved dataset.☆789Updated 5 months ago
- 🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)☆824Updated 6 months ago
- ☆291Updated this week
- VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs☆1,040Updated this week
- Agent S: an open agentic framework that uses computers like a human☆771Updated this week
- LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning☆1,768Updated last week
- HPT - Open Multimodal LLMs from HyperGAI☆313Updated 7 months ago
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆867Updated 3 months ago
- A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!☆801Updated last week
- OS-ATLAS: A Foundation Action Model For Generalist GUI Agents☆257Updated 2 weeks ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆721Updated 11 months ago
- Scalable RL solution for advanced reasoning of language models☆974Updated this week
- Large Reasoning Models☆801Updated last month
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,831Updated 2 months ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆2,835Updated this week
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,771Updated 2 months ago
- DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding☆1,005Updated last week