microsoft/superbenchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/microsoft/superbenchmark)

microsoft / superbenchmark

A validation and profiling tool for AI infrastructure

☆382

Alternatives and similar repositories for superbenchmark

Users that are interested in superbenchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

microsoft / mscclpp
View on GitHub
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆541Updated this week
microsoft / msccl
View on GitHub
Microsoft Collective Communication Library
☆394Sep 20, 2023Updated 2 years ago
microsoft / ark
View on GitHub
A GPU-driven system framework for scalable AI applications
☆130Updated this week
NVIDIA / nccl-tests
View on GitHub
NCCL Tests
☆1,595Jul 9, 2026Updated last week
microsoft / Tutel
View on GitHub
Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
☆996Jul 8, 2026Updated last week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
microsoft / nnfusion
View on GitHub
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆1,002Sep 19, 2024Updated last year
SymbioticLab / Oobleck
View on GitHub
A resilient distributed training framework
☆100Apr 11, 2024Updated 2 years ago
microsoft / NPKit
View on GitHub
NCCL Profiling Kit
☆155Jul 1, 2024Updated 2 years ago
microsoft / TrainVerify
View on GitHub
A verification tool for ensuring parallelization equivalence in distributed model training.
☆17Sep 1, 2025Updated 10 months ago
feifeibear / PyTorchMemTracer
View on GitHub
Depict GPU memory footprint during DNN training of PyTorch
☆11Nov 17, 2022Updated 3 years ago
volcengine / veScale
View on GitHub
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
☆1,031Mar 3, 2026Updated 4 months ago
aws-samples / ec2-topology-aware-for-slurm
View on GitHub
☆13May 30, 2025Updated last year
parasailteam / coconet
View on GitHub
☆85Dec 2, 2022Updated 3 years ago
microsoft / SuperScaler
View on GitHub
An experimental parallel training platform
☆57Mar 25, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Azure / msccl
View on GitHub
Microsoft Collective Communication Library
☆66Nov 23, 2024Updated last year
msr-fiddle / philly-traces
View on GitHub
☆199Aug 31, 2019Updated 6 years ago
NVIDIA / nvbandwidth
View on GitHub
A tool for bandwidth measurements on NVIDIA GPUs.
☆734Updated this week
InternLM / AcmeTrace
View on GitHub
☆179Mar 12, 2024Updated 2 years ago
ConnollyLeon / awesome-Auto-Parallelism
View on GitHub
A baseline repository of Auto-Parallelism in Training Neural Networks
☆145Jun 25, 2022Updated 4 years ago
yuuki / rpingmesh
View on GitHub
A service-aware RoCE network monitoring system based on end- to-end probing.
☆30Jul 6, 2026Updated 2 weeks ago
Mellanox / nccl-rdma-sharp-plugins
View on GitHub
RDMA and SHARP plugins for nccl library
☆233Apr 3, 2026Updated 3 months ago
imbue-ai / cluster-health
View on GitHub
Scripts for managing a large H100 cluster and fixing hardware issues to ensure smooth model training.
☆322Aug 20, 2024Updated last year
zhuohan123 / terapipe
View on GitHub
☆79May 4, 2021Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
NVIDIA / cloudai
View on GitHub
CloudAI Benchmark Framework
☆96Updated this week
meta-pytorch / torchsnapshot
View on GitHub
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆165Jun 10, 2026Updated last month
pytorch / kineto
View on GitHub
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
☆974Updated this week
microsoft / msccl-tools
View on GitHub
Synthesizer for optimal collective communication algorithms
☆125Apr 8, 2024Updated 2 years ago
LeiWang1999 / nvdla_loadables
View on GitHub
some sample caffemodel, prototxt, test images and pre compiled loadabes .
☆14Apr 30, 2021Updated 5 years ago
microsoft / AttentionEngine
View on GitHub
☆123May 19, 2025Updated last year
geoffxy / habitat
View on GitHub
🔮 Execution time predictions for deep neural network training iterations across different GPUs.
☆63Nov 26, 2022Updated 3 years ago
alpa-projects / alpa
View on GitHub
Training and serving large-scale neural networks with auto parallelization.
☆3,178Dec 9, 2023Updated 2 years ago
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,435Updated this week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
microsoft / openpaisdk
View on GitHub
OpenPAI SDK
☆19Dec 10, 2022Updated 3 years ago
open-neutrino / neutrino
View on GitHub
☆263Dec 25, 2025Updated 6 months ago
NVIDIA / gdrcopy
View on GitHub
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
☆1,399Updated this week
jiazhihao / attention_superoptimizer
View on GitHub
An Attention Superoptimizer
☆22Jan 20, 2025Updated last year
NVIDIA / cuda-checkpoint
View on GitHub
CUDA checkpoint and restore utility
☆474Jul 6, 2026Updated 2 weeks ago
microsoft / hivedscheduler
View on GitHub
Kubernetes Scheduler for Deep Learning
☆263May 22, 2022Updated 4 years ago
NVIDIA / nccl
View on GitHub
Optimized primitives for collective multi-GPU communication
☆4,893Updated this week