AI-HPC-Research-Team / AIPerfLinks

Automated machine learning as an AI-HPC benchmark

☆65

Alternatives and similar repositories for AIPerf

Users that are interested in AIPerf are comparing it to the libraries listed below

Sorting:

microsoft / msccl-tools
Synthesizer for optimal collective communication algorithms
☆118Updated last year
eniac / paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆62Updated last year
Mellanox / nccl-rdma-sharp-plugins
RDMA and SHARP plugins for nccl library
☆210Updated this week
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆113Updated 5 months ago
1duo / nccl-examples
NCCL Examples from Official NVIDIA NCCL Developer Guide.
☆19Updated 7 years ago
microsoft / NPKit
NCCL Profiling Kit
☆145Updated last year
alibaba / GPU-scheduler-for-deep-learning
GPU-scheduler-for-deep-learning
☆210Updated 4 years ago
bytedance / ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…
☆265Updated 2 months ago
shen203 / GPU_Microbenchmark
☆24Updated 3 years ago
netx-repo / PipeSwitch
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
☆126Updated 3 years ago
SymbioticLab / Salus
Fine-grained GPU sharing primitives
☆146Updated 3 months ago
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆62Updated last year
GVProf / GVProf
GVProf: A Value Profiler for GPU-based Clusters
☆52Updated last year
parasailteam / coconet
☆83Updated 2 years ago
calculon-ai / calculon
☆154Updated last year
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆54Updated last year
amirgholami / ai_and_memory_wall
AI and Memory Wall
☆219Updated last year
thu-pacman / PET
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
☆122Updated 3 years ago
Funatiq / gossip
gossip: Efficient Communication Primitives for Multi-GPU Systems
☆59Updated 3 years ago
google / nccl-fastsocket
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
☆121Updated last year
mcrl / tccl
Thunder Research Group's Collective Communication Library
☆42Updated 3 months ago
astra-sim / tacos
TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning
☆27Updated 4 months ago
Azure / msccl-executor-nccl
☆46Updated 10 months ago
AlibabaPAI / DAPPLE
An Efficient Pipelined Data Parallel Approach for Training Large Model
☆76Updated 4 years ago
Mellanox / nv_peer_memory
☆376Updated last year
OpenPPL / CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆83Updated 2 years ago
zw0610 / zw0610.github.io
☆58Updated 5 years ago
apache / tvm-rfcs
A home for the final text of all TVM RFCs.
☆108Updated last year
casys-kaist / glet
☆53Updated 10 months ago
quiver-team / quiver-feature
High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph
☆55Updated 3 years ago