NVIDIA/nvbench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NVIDIA/nvbench)

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

☆900

Alternatives and similar repositories for nvbench

Users that are interested in nvbench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NVIDIA / nvbench_demo
View on GitHub
Simple starter CMake project that uses NVBench.
☆15May 6, 2025Updated last year
NVIDIA / jitify
View on GitHub
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
☆573Sep 15, 2025Updated 10 months ago
NVIDIA / cccl
View on GitHub
CUDA Core Compute Libraries
☆2,431Updated this week
RRZE-HPC / gpu-benches
View on GitHub
collection of benchmarks to measure basic GPU capabilities
☆530Oct 24, 2025Updated 8 months ago
NVIDIA / cub
View on GitHub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
☆1,840Oct 9, 2023Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,104Updated this week
NVIDIA / cuCollections
View on GitHub
☆654Updated this week
NVIDIA / multi-gpu-programming-models
View on GitHub
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
☆908Sep 26, 2025Updated 9 months ago
NVIDIA / NVTX
View on GitHub
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…
☆544Updated this week
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,550Jul 13, 2026Updated last week
NVIDIA / CUDALibrarySamples
View on GitHub
CUDA Library Samples
☆2,463Updated this week
NVIDIA / Fuser
View on GitHub
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆396May 31, 2026Updated last month
NVIDIA / gdrcopy
View on GitHub
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
☆1,399Updated this week
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,493Jul 11, 2026Updated last week
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,060Updated this week
NVIDIA / MatX
View on GitHub
An efficient C++20 GPU numerical computing library with Python-like syntax
☆1,438Updated this week
ekondis / mixbench
View on GitHub
A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
☆462Jul 12, 2026Updated last week
rapidsai / rmm
View on GitHub
RAPIDS Memory Manager
☆705Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,983Updated this week
NVIDIA / nvbandwidth
View on GitHub
A tool for bandwidth measurements on NVIDIA GPUs.
☆732Apr 8, 2026Updated 3 months ago
sjfeng1999 / gpu-arch-microbenchmark
View on GitHub
Dissecting NVIDIA GPU Architecture
☆125Jul 11, 2022Updated 4 years ago
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,434Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
cloudcores / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆609Apr 20, 2023Updated 3 years ago
NVIDIA / libcudacxx
View on GitHub
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
☆2,304Feb 7, 2024Updated 2 years ago
KnowingNothing / MatmulTutorial
View on GitHub
A Easy-to-understand TensorOp Matmul Tutorial
☆445Mar 5, 2026Updated 4 months ago
mirage-project / mirage
View on GitHub
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
☆2,377Updated this week
TiledTensor / TiledCUDA
View on GitHub
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆192Jan 28, 2025Updated last year
flashinfer-ai / cubloaty
View on GitHub
a size profiler for cuda binary
☆71Jan 15, 2026Updated 6 months ago
IST-DASLab / marlin
View on GitHub
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
☆1,109Sep 4, 2024Updated last year
NVIDIA / tilus
View on GitHub
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
☆489Jul 5, 2026Updated 2 weeks ago
NVIDIA / cuda-samples
View on GitHub
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
☆9,404May 27, 2026Updated last month
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,343Aug 28, 2025Updated 10 months ago
meta-pytorch / tritonbench
View on GitHub
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆361Updated this week
Bruce-Lee-LY / cuda_hgemm
View on GitHub
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆556Sep 8, 2024Updated last year
ColfaxResearch / cutlass-kernels
View on GitHub
☆269Jul 11, 2024Updated 2 years ago
sunlex0717 / DissectingTensorCores
View on GitHub
☆114Apr 19, 2024Updated 2 years ago
microsoft / mscclpp
View on GitHub
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆541Updated this week
microsoft / nnfusion
View on GitHub
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆1,002Sep 19, 2024Updated last year