NVIDIA/cub

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NVIDIA/cub)

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

☆1,840

Alternatives and similar repositories for cub

Users that are interested in cub are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NVIDIA / thrust
View on GitHub
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
☆5,004Feb 8, 2024Updated 2 years ago
moderngpu / moderngpu
View on GitHub
Patterns and behaviors for GPU computing
☆1,782Jan 17, 2026Updated 6 months ago
NVIDIA / cuCollections
View on GitHub
☆654Updated this week
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,104Updated this week
NVIDIA / libcudacxx
View on GitHub
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
☆2,304Feb 7, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
NVIDIA / jitify
View on GitHub
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
☆573Sep 15, 2025Updated 10 months ago
NVIDIA / cccl
View on GitHub
CUDA Core Compute Libraries
☆2,431Updated this week
cudpp / cudpp
View on GitHub
CUDA Data Parallel Primitives Library
☆438Nov 9, 2018Updated 7 years ago
NVIDIA / nvbench
View on GitHub
CUDA Kernel Benchmarking Library
☆900Updated this week
NVIDIA / nccl
View on GitHub
Optimized primitives for collective multi-GPU communication
☆4,892Updated this week
NVIDIA / CUDALibrarySamples
View on GitHub
CUDA Library Samples
☆2,463Updated this week
NVIDIA-developer-blog / code-samples
View on GitHub
Source code examples from the Parallel Forall Blog
☆1,332Sep 23, 2025Updated 9 months ago
NVIDIA / cuda-samples
View on GitHub
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
☆9,404May 27, 2026Updated last month
eyalroz / cuda-api-wrappers
View on GitHub
Thin, unified, C++-flavored wrappers for the CUDA APIs
☆900Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
NervanaSystems / maxas
View on GitHub
Assembler for NVIDIA Maxwell architecture
☆1,072Jan 3, 2023Updated 3 years ago
gunrock / gunrock
View on GitHub
Programmable CUDA/C++ GPU Graph Analytics
☆1,096Feb 28, 2026Updated 4 months ago
rapidsai / rmm
View on GitHub
RAPIDS Memory Manager
☆705Updated this week
stotko / stdgpu
View on GitHub
stdgpu: Efficient STL-like Data Structures on the GPU
☆1,265Jul 8, 2026Updated last week
NVIDIA / multi-gpu-programming-models
View on GitHub
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
☆908Sep 26, 2025Updated 9 months ago
microsoft / nnfusion
View on GitHub
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆1,002Sep 19, 2024Updated last year
NVIDIA / gdrcopy
View on GitHub
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
☆1,399Updated this week
NVlabs / cub
View on GitHub
THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
☆87Feb 21, 2024Updated 2 years ago
apache / tvm
View on GitHub
Open Machine Learning Compiler Framework
☆13,588Updated this week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
dmlc / dlpack
View on GitHub
common in-memory tensor structure
☆1,232Jun 19, 2026Updated last month
triton-lang / triton
View on GitHub
Development repository for the Triton language and compiler
☆19,725Updated this week
halide / Halide
View on GitHub
a language for fast, portable data-parallel computation
☆6,563Updated this week
NVIDIA / FasterTransformer
View on GitHub
Transformer related optimization, including BERT, GPT
☆6,439Mar 27, 2024Updated 2 years ago
NVIDIA / MatX
View on GitHub
An efficient C++20 GPU numerical computing library with Python-like syntax
☆1,438Updated this week
alibaba / BladeDISC
View on GitHub
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
☆932Dec 30, 2024Updated last year
cloudcores / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆609Apr 20, 2023Updated 3 years ago
NVIDIA / TensorRT
View on GitHub
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source compone…
☆13,164Jul 7, 2026Updated last week
Yinghan-Li / YHs_Sample
View on GitHub
Yinghan's Code Sample
☆365Jul 25, 2022Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,434Updated this week
NVIDIA / Fuser
View on GitHub
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆396May 31, 2026Updated last month
flame / how-to-optimize-gemm
View on GitHub
☆2,020Jul 29, 2023Updated 2 years ago
NVIDIA / NVTX
View on GitHub
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…
☆544Updated this week
pytorch / FBGEMM
View on GitHub
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
☆1,570Updated this week
arrayfire / arrayfire
View on GitHub
ArrayFire: a general purpose GPU library.
☆4,896Mar 7, 2026Updated 4 months ago
NVIDIA / raft
View on GitHub
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-a…
☆1,029Updated this week