mlcommons / inference_results_v4.0
This repository contains the results and code for the MLPerf™ Inference v4.0 benchmark.
☆10Updated 2 months ago
Related projects: ⓘ
- This repository contains the results and code for the MLPerf™ Training v3.0 benchmark.☆12Updated last year
- This repository contains the results and code for the MLPerf™ Inference v1.1 benchmark.☆11Updated 6 months ago
- ☆13Updated 2 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆35Updated this week
- RCCL Performance Benchmark Tests☆41Updated last week
- ☆53Updated last week
- OpenAI Triton backend for Intel® GPUs☆126Updated this week
- Experimental projects related to TensorRT☆62Updated this week
- Reference implementations of MLPerf™ HPC training benchmarks☆39Updated 3 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆56Updated 3 weeks ago
- Intel® Tensor Processing Primitives extension for Pytorch*☆10Updated last week
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆29Updated last month
- This repository contains the results and code for the MLPerf™ Inference v1.0 benchmark.☆30Updated last year
- ☆73Updated 5 months ago
- CUDA GPU Benchmark☆15Updated 2 months ago
- Stores documents and resources used by the OpenXLA developer community☆105Updated last month
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆15Updated last week
- High Performance Linpack for Next-Generation AMD HPC Accelerators☆41Updated last week
- HPCG benchmark based on ROCm platform☆35Updated 2 months ago
- Example of using pytorch's open device registration API☆25Updated last year
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆96Updated 7 years ago
- ☆20Updated last year
- A simple tool to profile performance of multiple combinations of GEMM of cuBLAS☆24Updated 3 years ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆233Updated this week
- Bandwidth test for ROCm☆45Updated this week
- Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, all…☆22Updated last year
- Convert nvprof profiles into about:tracing compatible JSON files☆67Updated 3 years ago
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆126Updated 4 years ago
- ☆21Updated this week
- ROCm BLAS marshalling library☆110Updated this week