mlcommons / inferenceLinks
Reference implementations of MLPerf™ inference benchmarks
☆1,386Updated this week
Alternatives and similar repositories for inference
Users that are interested in inference are comparing it to the libraries listed below
Sorting:
- Reference implementations of MLPerf™ training benchmarks☆1,673Updated 2 weeks ago
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆866Updated last week
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆986Updated 8 months ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…☆2,435Updated last week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆400Updated this week
- FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/☆1,342Updated this week
- The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.☆1,545Updated this week
- nvidia-modelopt is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculat…☆942Updated last week
- Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.☆623Updated 2 weeks ago
- BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.☆871Updated 5 months ago
- ☆415Updated last week
- NCCL Tests☆1,125Updated 3 weeks ago
- ONNX Optimizer☆715Updated this week
- A library to analyze PyTorch traces.☆379Updated this week
- A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.☆818Updated this week
- Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training☆1,797Updated this week
- AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.☆2,318Updated this week
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.☆1,046Updated last year
- CUDA Kernel Benchmarking Library☆650Updated this week
- Dive into Deep Learning Compiler☆645Updated 2 years ago
- The Tensor Algebra SuperOptimizer for Deep Learning☆714Updated 2 years ago
- This repository contains tutorials and examples for Triton Inference Server☆713Updated this week
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆1,417Updated 10 months ago
- An open-source efficient deep learning framework/compiler, written in python.☆698Updated this week
- Collective communications library with various primitives for multi-machine training.☆1,306Updated this week
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆568Updated this week
- common in-memory tensor structure☆1,002Updated 3 weeks ago
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX R…☆2,419Updated this week
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆476Updated last month
- Neural Network Compression Framework for enhanced OpenVINO™ inference☆1,034Updated this week