FlagCX is a scalable and adaptive cross-chip communication library.
☆179Mar 5, 2026Updated this week
Alternatives and similar repositories for FlagCX
Users that are interested in FlagCX are comparing it to the libraries listed below
Sorting:
- FlagScale is a large model toolkit based on open-sourced projects.☆485Updated this week
- Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport☆73May 9, 2025Updated 9 months ago
- Kernel Library Wheel for SGLang☆16Updated this week
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆92Jan 26, 2026Updated last month
- FlagGems is an operator library for large language models implemented in the Triton Language.☆909Updated this week
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆21Feb 9, 2026Updated 3 weeks ago
- ☆24Updated this week
- ☆71Mar 26, 2025Updated 11 months ago
- Venus Collective Communication Library, supported by SII and Infrawaves.☆138Updated this week
- High performance inference engine for diffusion models☆105Sep 5, 2025Updated 6 months ago
- ☆74Updated this week
- ☆26Feb 17, 2025Updated last year
- vLLM Daily Summarization of Merged PRs☆46Updated this week
- ☆54Mar 15, 2025Updated 11 months ago
- A lightweight design for computation-communication overlap.☆223Jan 20, 2026Updated last month
- ☆41Apr 25, 2024Updated last year
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆55Updated this week
- ☆74Oct 31, 2024Updated last year
- ☆13Jan 7, 2025Updated last year
- OneFlow Diffusers Web UI☆11Apr 11, 2023Updated 2 years ago
- AI Cluster Observability & Troubleshooting Toolkit. Powered by SII & Infrawaves.☆33Feb 10, 2026Updated 3 weeks ago
- Perplexity GPU Kernels☆567Nov 7, 2025Updated 4 months ago
- ☆87Jan 23, 2025Updated last year
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆150May 10, 2025Updated 9 months ago
- triton for dsa☆58Updated this week
- Nex Venus Communication Library☆72Nov 17, 2025Updated 3 months ago
- NVIDIA Inference Xfer Library (NIXL)☆898Feb 28, 2026Updated last week
- NGC Container Replicator☆28Dec 26, 2022Updated 3 years ago
- ☆527Feb 10, 2026Updated 3 weeks ago
- PyTorch distributed training acceleration framework☆54Aug 13, 2025Updated 6 months ago
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,264Aug 28, 2025Updated 6 months ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆34Feb 10, 2025Updated last year
- FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang…☆214Updated this week
- 🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.☆255Feb 13, 2026Updated 3 weeks ago
- The driver for LMCache core to run in vLLM☆61Feb 4, 2025Updated last year
- A Triton-only attention backend for vLLM☆24Feb 11, 2026Updated 3 weeks ago
- This project includes a simulator and workload generator for Edge-to-Cloud environments. Users can implement different scenarios, includi…☆15Aug 7, 2024Updated last year
- PathwaysJob API is an OSS Kubernetes-native API, to deploy ML training and batch inference workloads, using Pathways on GKE.☆19Oct 22, 2025Updated 4 months ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Feb 20, 2026Updated 2 weeks ago