flagos-ai/FlagScale

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/flagos-ai/FlagScale)

flagos-ai / FlagScale

FlagScale is a large model toolkit based on open-sourced projects.

☆514

Alternatives and similar repositories for FlagScale

Users that are interested in FlagScale are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

flagos-ai / FlagCX
View on GitHub
FlagCX is a scalable and adaptive cross-chip communication library.
☆208Updated this week
flagos-ai / FlagGems
View on GitHub
FlagGems is an operator library for large language models implemented in the Triton Language.
☆996Updated this week
FlagAI-Open / OpenSeek
View on GitHub
OpenSeek aims to unite the global open source community to drive collaborative innovation in algorithms, data and systems to develop next…
☆255Jan 22, 2026Updated 4 months ago
flagos-ai / FlagAttention
View on GitHub
A collection of memory efficient attention operators implemented in the Triton language.
☆291Jun 5, 2024Updated last year
flagos-ai / FlagPerf
View on GitHub
FlagPerf is an open-source software platform for benchmarking AI chips.
☆365Nov 11, 2025Updated 6 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
feifeibear / long-context-attention
View on GitHub
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
☆667Jan 15, 2026Updated 4 months ago
flagos-ai / FlagTree
View on GitHub
FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang…
☆264May 16, 2026Updated last week
FlagAI-Open / Aquila2
View on GitHub
The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.
☆445Oct 11, 2024Updated last year
volcengine / veScale
View on GitHub
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
☆1,014Mar 3, 2026Updated 2 months ago
alibaba / Pai-Megatron-Patch
View on GitHub
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
☆1,571Dec 15, 2025Updated 5 months ago
Oneflow-Inc / serving
View on GitHub
OneFlow Serving
☆20Apr 10, 2025Updated last year
sail-sg / zero-bubble-pipeline-parallelism
View on GitHub
Zero Bubble Pipeline Parallelism
☆456May 7, 2025Updated last year
alibaba / ChatLearn
View on GitHub
A flexible and efficient training framework for large-scale alignment tasks
☆453Oct 23, 2025Updated 7 months ago
alibaba / Megatron-LLaMA
View on GitHub
Best practice for training LLaMA models in Megatron-LM
☆665Jan 2, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
flageval-baai / CMMU
View on GitHub
[IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
☆26Feb 1, 2024Updated 2 years ago
intelligent-machine-learning / dlrover
View on GitHub
DLRover: An Automatic Distributed Deep Learning System
☆1,655Apr 15, 2026Updated last month
yester31 / Cutlass_EX
View on GitHub
study of cutlass
☆22Nov 10, 2024Updated last year
NVIDIA / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆16,427Updated this week
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆5,339May 15, 2026Updated last week
deepspeedai / Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆2,246Aug 14, 2025Updated 9 months ago
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,309Aug 28, 2025Updated 8 months ago
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,340May 14, 2026Updated last week
infinigence / Semi-PD
View on GitHub
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆126Dec 25, 2025Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
deepspeedai / DeepSpeed-Kernels
View on GitHub
☆75Mar 26, 2025Updated last year
bigscience-workshop / Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆1,440Mar 20, 2024Updated 2 years ago
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,621May 16, 2026Updated last week
alibaba / EasyParallelLibrary
View on GitHub
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
☆271Mar 31, 2023Updated 3 years ago
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 9 months ago
FlagAI-Open / FlagAI
View on GitHub
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
☆3,875May 11, 2026Updated last week
InternLM / InternEvo
View on GitHub
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencie…
☆419Aug 21, 2025Updated 9 months ago
flagos-ai / awesome-LLM-driven-kernel-generation
View on GitHub
Review automated kernel generation in the era of LLMs
☆214May 14, 2026Updated last week
dame-cell / Triformer
View on GitHub
Transformers components but in Triton
☆34May 9, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
FlagOpen / Infinity-Instruct
View on GitHub
☆52Jun 14, 2024Updated last year
microsoft / sarathi-serve
View on GitHub
A low-latency & high-throughput serving engine for LLMs
☆502Jan 8, 2026Updated 4 months ago
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,041Updated this week
zhuzilin / ring-flash-attention
View on GitHub
Ring attention implementation with flash attention
☆1,020Sep 10, 2025Updated 8 months ago
perplexityai / pplx-kernels
View on GitHub
Perplexity GPU Kernels
☆576Nov 7, 2025Updated 6 months ago
Adlik / model_zoo
View on GitHub
☆11Dec 26, 2025Updated 4 months ago
ModelTC / LightLLM
View on GitHub
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…
☆4,073Updated this week