NVIDIA/cccl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NVIDIA/cccl)

NVIDIA / cccl

CUDA Core Compute Libraries

☆2,431

Alternatives and similar repositories for cccl

Users that are interested in cccl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,104Updated this week
NVIDIA / cuCollections
View on GitHub
☆654Updated this week
NVIDIA / nvbench
View on GitHub
CUDA Kernel Benchmarking Library
☆900Updated this week
NVIDIA / cub
View on GitHub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
☆1,840Oct 9, 2023Updated 2 years ago
NVIDIA / MatX
View on GitHub
An efficient C++20 GPU numerical computing library with Python-like syntax
☆1,438Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
NVIDIA / thrust
View on GitHub
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
☆5,004Feb 8, 2024Updated 2 years ago
NVIDIA / cuda-samples
View on GitHub
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
☆9,404May 27, 2026Updated last month
NVIDIA / libcudacxx
View on GitHub
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
☆2,304Feb 7, 2024Updated 2 years ago
NVIDIA / CUDALibrarySamples
View on GitHub
CUDA Library Samples
☆2,463Updated this week
NVIDIA / NVTX
View on GitHub
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…
☆544Updated this week
NVIDIA / stdexec
View on GitHub
`std::execution`, the standard C++ framework for asynchronous and parallel programming.
☆2,388Jul 12, 2026Updated last week
NVIDIA / Fuser
View on GitHub
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆396May 31, 2026Updated last month
rapidsai / rmm
View on GitHub
RAPIDS Memory Manager
☆705Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,983Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
triton-lang / triton
View on GitHub
Development repository for the Triton language and compiler
☆19,725Updated this week
NVIDIA / nccl
View on GitHub
Optimized primitives for collective multi-GPU communication
☆4,892Updated this week
NVIDIA / jitify
View on GitHub
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
☆573Sep 15, 2025Updated 10 months ago
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,434Updated this week
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,550Jul 13, 2026Updated last week
NVIDIA / multi-gpu-programming-models
View on GitHub
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
☆908Sep 26, 2025Updated 9 months ago
NVIDIA / cutile-python
View on GitHub
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
☆2,118Updated this week
stotko / stdgpu
View on GitHub
stdgpu: Efficient STL-like Data Structures on the GPU
☆1,265Jul 8, 2026Updated last week
NVIDIA / cuda-python
View on GitHub
CUDA Python: Performance meets Productivity
☆3,320Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
BBuf / how-to-optim-algorithm-in-cuda
View on GitHub
how to optimize some algorithm in cuda.
☆3,141Updated this week
AdaptiveCpp / AdaptiveCpp
View on GitHub
Compiler for multiple programming models (SYCL, C++ standard parallelism, HIP/CUDA) for CPUs and GPUs from all vendors: The independent, …
☆1,908Updated this week
cloudcores / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆609Apr 20, 2023Updated 3 years ago
kokkos / kokkos
View on GitHub
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
☆2,611Updated this week
NVIDIA / gdrcopy
View on GitHub
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
☆1,399Updated this week
NVIDIA / TensorRT-LLM
View on GitHub
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…
☆14,158Updated this week
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,060Updated this week
NVIDIA / raft
View on GitHub
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-a…
☆1,029Updated this week
KnowingNothing / MatmulTutorial
View on GitHub
A Easy-to-understand TensorOp Matmul Tutorial
☆445Mar 5, 2026Updated 4 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
gpu-mode / lectures
View on GitHub
Material for gpu-mode lectures
☆6,325Jun 15, 2026Updated last month
moderngpu / moderngpu
View on GitHub
Patterns and behaviors for GPU computing
☆1,782Jan 17, 2026Updated 6 months ago
NVIDIA / FasterTransformer
View on GitHub
Transformer related optimization, including BERT, GPT
☆6,439Mar 27, 2024Updated 2 years ago
tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆6,667Updated this week
Bruce-Lee-LY / cuda_hgemm
View on GitHub
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆556Sep 8, 2024Updated last year
eyalroz / cuda-api-wrappers
View on GitHub
Thin, unified, C++-flavored wrappers for the CUDA APIs
☆900Updated this week
Liu-xiandong / How_to_optimize_in_GPU
View on GitHub
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…
☆1,329Jul 29, 2023Updated 2 years ago