infinigence / FUSCOLinks
High-performance distributed data shuffling (all-to-all) library for MoE training and inference
☆107Updated 3 weeks ago
Alternatives and similar repositories for FUSCO
Users that are interested in FUSCO are comparing it to the libraries listed below
Sorting:
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆91Updated last week
- DeeperGEMM: crazy optimized version☆73Updated 8 months ago
- A lightweight design for computation-communication overlap.☆213Updated 3 weeks ago
- ☆65Updated 8 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆153Updated 4 months ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆79Updated 3 weeks ago
- ☆340Updated 2 weeks ago
- ☆52Updated 8 months ago
- Distributed MoE in a Single Kernel [NeurIPS '25]☆183Updated this week
- ☆82Updated 3 months ago
- Tile-based language built for AI computation across all scales☆116Updated last week
- Autonomous GPU Kernel Generation via Deep Agents☆217Updated this week
- a size profiler for cuda binary☆69Updated last week
- Efficient Compute-Communication Overlap for Distributed LLM Inference☆68Updated 2 months ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆55Updated last year
- High performance Transformer implementation in C++.☆148Updated last year
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆80Updated last month
- Building the Virtuous Cycle for AI-driven LLM Systems☆121Updated this week
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆80Updated 4 months ago
- ☆38Updated 6 months ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆82Updated last year
- [ICML 2024] Serving LLMs on heterogeneous decentralized clusters.☆34Updated last year
- ☆117Updated 8 months ago
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆79Updated last month
- ☆78Updated this week
- nnScaler: Compiling DNN models for Parallel Training☆123Updated 3 months ago
- Nex Venus Communication Library☆72Updated 2 months ago
- ☆32Updated 6 months ago
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆96Updated last month
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆65Updated last month