The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
☆3,419Jul 7, 2025Updated 10 months ago
Alternatives and similar repositories for MiniMax-01
Users that are interested in MiniMax-01 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- MoBA: Mixture of Block Attention for Long-Context LLMs☆2,117Apr 3, 2025Updated last year
- ☆3,474Mar 7, 2025Updated last year
- 🚀 Efficient implementations for emerging model architectures☆5,116Updated this week
- ☆814Jun 9, 2025Updated 11 months ago
- MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.☆3,150Jul 7, 2025Updated 10 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- FlashMLA: Efficient Multi-head Latent Attention Kernels☆12,651Apr 30, 2026Updated 2 weeks ago
- Explore these applications integrating MiniMax's multimodal API to see how text, vision, and speech processing capabilities are incorpora…☆76Jan 30, 2026Updated 3 months ago
- Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.☆27,228Jan 9, 2026Updated 4 months ago
- Muon is Scalable for LLM Training☆1,475Aug 3, 2025Updated 9 months ago
- SGLang is a high-performance serving framework for large language models and multimodal models.☆27,836Updated this week
- ☆3,185Mar 17, 2025Updated last year
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆19,193Jan 30, 2026Updated 3 months ago
- Fully open reproduction of DeepSeek-R1☆26,018Apr 2, 2026Updated last month
- Sky-T1: Train your own O1 preview model within $450☆3,383Jul 12, 2025Updated 10 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.