rhymes-ai / Aria
Codebase for Aria - an Open Multimodal Native MoE
☆850Updated this week
Related projects ⓘ
Alternatives and complementary repositories for Aria
- Janus-Series: Unified Multimodal Understanding and Generation Models☆1,116Updated last week
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation☆677Updated 3 months ago
- ☆1,184Updated this week
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆534Updated 2 weeks ago
- OLMoE: Open Mixture-of-Experts Language Models☆461Updated this week
- 🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)☆813Updated 4 months ago
- ☆837Updated this week
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,841Updated 3 months ago
- GRadient-INformed MoE☆260Updated last month
- [NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces in…☆796Updated this week
- Agent S: an open agentic framework that uses computers like a human☆616Updated this week
- Large Reasoning Models☆620Updated this week
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆801Updated 3 weeks ago
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills☆709Updated 9 months ago
- MobiLlama : Small Language Model tailored for edge devices☆595Updated 8 months ago
- Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple te…☆611Updated this week
- HPT - Open Multimodal LLMs from HyperGAI☆312Updated 5 months ago
- Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"☆780Updated 2 months ago
- Official repository for the paper PLLaVA☆594Updated 3 months ago
- Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.☆484Updated this week
- VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs☆903Updated this week
- MINT-1T: A one trillion token multimodal interleaved dataset.☆775Updated 3 months ago
- Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraini…☆503Updated 3 months ago
- Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality s…☆495Updated 2 weeks ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆804Updated 3 months ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆648Updated last month
- ☆701Updated 8 months ago
- ☆286Updated 2 weeks ago
- OS-ATLAS: A Foundation Action Model For Generalist GUI Agents☆173Updated this week