mlcommons / inference_results_v4.0

This repository contains the results and code for the MLPerf™ Inference v4.0 benchmark.

☆10

Related projects: ⓘ

mlcommons / training_results_v3.0
This repository contains the results and code for the MLPerf™ Training v3.0 benchmark.
☆12Updated last year
mlcommons / inference_results_v1.1
This repository contains the results and code for the MLPerf™ Inference v1.1 benchmark.
☆11Updated 6 months ago
hummingtree / cuda-graph-with-dynamic-parameters
☆13Updated 2 years ago
ROCm / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆35Updated this week
ROCm / rccl-tests
RCCL Performance Benchmark Tests
☆41Updated last week
intel / xetla
☆53Updated last week
intel / intel-xpu-backend-for-triton
OpenAI Triton backend for Intel® GPUs
☆126Updated this week
NVIDIA / TensorRT-Incubator
Experimental projects related to TensorRT
☆62Updated this week
mlcommons / hpc
Reference implementations of MLPerf™ HPC training benchmarks
☆39Updated 3 months ago
intel / intel-extension-for-deepspeed
Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…
☆56Updated 3 weeks ago
libxsmm / tpp-pytorch-extension
Intel® Tensor Processing Primitives extension for Pytorch*
☆10Updated last week
north-numerical-computing / tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
☆29Updated last month
mlcommons / inference_results_v1.0
This repository contains the results and code for the MLPerf™ Inference v1.0 benchmark.
☆30Updated last year
sunlex0717 / DissectingTensorCores
☆73Updated 5 months ago
hibagus / CUDA_Bench
CUDA GPU Benchmark
☆15Updated 2 months ago
openxla / community
Stores documents and resources used by the OpenXLA developer community
☆105Updated last month
HabanaAI / Habana_Custom_Kernel
Provides the examples to write and build Habana custom kernels using the HabanaTools
☆15Updated last week
ROCm / rocHPL
High Performance Linpack for Next-Generation AMD HPC Accelerators
☆41Updated last week
ROCm / rocHPCG
HPCG benchmark based on ROCm platform
☆35Updated 2 months ago
bdhirsh / pytorch_open_registration_example
Example of using pytorch's open device registration API
☆25Updated last year
ekondis / gpumembench
A GPU benchmark suite for assessing on-chip GPU memory bandwidth
☆96Updated 7 years ago
intel / llvm-test-suite
☆20Updated last year
jeng1220 / cuGemmProf
A simple tool to profile performance of multiple combinations of GEMM of cuBLAS
☆24Updated 3 years ago
microsoft / mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆233Updated this week
ROCm / rocm_bandwidth_test
Bandwidth test for ROCm
☆45Updated this week
muriloboratto / NCCL
Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, all…
☆22Updated last year
ezyang / nvprof2json
Convert nvprof profiles into about:tracing compatible JSON files
☆67Updated 3 years ago
cwpearson / nvidia-performance-tools
Instructions, Docker images, and examples for Nsight Compute and Nsight Systems
☆126Updated 4 years ago
mnicely / cublasLt_examples
☆21Updated this week
ROCm / hipBLAS
ROCm BLAS marshalling library
☆110Updated this week