mlcommons / inferenceLinks

Reference implementations of MLPerf™ inference benchmarks

☆1,420

Alternatives and similar repositories for inference

Users that are interested in inference are comparing it to the libraries listed below

Sorting:

mlcommons / training
Reference implementations of MLPerf™ training benchmarks
☆1,696Updated last week
pytorch / FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
☆1,415Updated this week
onnx / onnx-mlir
Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
☆887Updated this week
microsoft / nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆992Updated 10 months ago
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…
☆2,587Updated this week
llvm / torch-mlir
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
☆1,591Updated last week
pytorch / kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
☆842Updated last week
NVIDIA / nccl-tests
NCCL Tests
☆1,199Updated last week
pytorch / benchmark
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
☆966Updated this week
intel / neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX R…
☆2,461Updated last week
alibaba / BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
☆886Updated 7 months ago
ROCm / composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
☆444Updated this week
NVIDIA / TensorRT-Model-Optimizer
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …
☆1,078Updated 2 weeks ago
NVIDIA / nccl
Optimized primitives for collective multi-GPU communication
☆3,889Updated last week
intel / ai-reference-models
Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Inte…
☆717Updated last week
NVIDIA / cudnn-frontend
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
☆596Updated 2 weeks ago
flexflow / flexflow-train
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
☆1,809Updated this week
jiazhihao / TASO
The Tensor Algebra SuperOptimizer for Deep Learning
☆726Updated 2 years ago
pytorch / torchdynamo
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
☆1,056Updated last year
tensorflow / mlir-hlo
☆420Updated this week
msr-fiddle / pipedream
☆393Updated 2 years ago
onnx / optimizer
ONNX Optimizer
☆735Updated 2 weeks ago
NVIDIA / DCGM
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
☆551Updated 2 months ago
NVIDIA / nvbench
CUDA Kernel Benchmarking Library
☆691Updated last week
openxla / stablehlo
Backward compatible ML compute opset inspired by HLO/MHLO
☆510Updated last week
d2l-ai / d2l-tvm
Dive into Deep Learning Compiler
☆646Updated 3 years ago
NVIDIA / multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
☆763Updated 5 months ago
dmlc / dlpack
common in-memory tensor structure
☆1,042Updated last month
NVIDIA / gdrcopy
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
☆1,169Updated last month
google / XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
☆2,072Updated last week