NVIDIA curated collection of educational resources related to general purpose GPU programming.
☆1,254Updated this week
Alternatives and similar repositories for accelerated-computing-hub
Users that are interested in accelerated-computing-hub are comparing it to the libraries listed below
Sorting:
- CUDA Core Compute Libraries☆2,182Updated this week
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆57Feb 20, 2026Updated last week
- CUDA Python: Performance meets Productivity☆3,173Updated this week
- NVIDIA Math Libraries for the Python Ecosystem☆552Jan 16, 2026Updated last month
- RAPIDS Deployment Documentation☆15Feb 13, 2026Updated 2 weeks ago
- CUDA Kernel Benchmarking Library☆820Updated this week
- Some CUDA example code with READMEs.☆179Nov 11, 2025Updated 3 months ago
- The CUDA target for Numba☆259Updated this week
- ☆624Feb 20, 2026Updated last week
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆9,315Updated this week
- GPU programming related news and material links☆1,997Sep 17, 2025Updated 5 months ago
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆869Sep 26, 2025Updated 5 months ago
- Material for gpu-mode lectures☆5,773Feb 1, 2026Updated 3 weeks ago
- Helpful kernel tutorials and examples for tile-based GPU programming☆654Updated this week
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆70Apr 14, 2025Updated 10 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆252May 6, 2025Updated 9 months ago
- Fast low-bit matmul kernels in Triton☆433Feb 1, 2026Updated 3 weeks ago
- The Open GPU Server for CI purpose.☆15Feb 16, 2026Updated last week
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆466Dec 31, 2025Updated last month
- RAPIDS Memory Manager☆683Updated this week
- ☆65Apr 26, 2025Updated 10 months ago
- NVIDIA Inference Xfer Library (NIXL)☆890Feb 20, 2026Updated last week
- A lightweight design for computation-communication overlap.☆221Jan 20, 2026Updated last month
- The repository contains container recipes to build the entire stack of Xeus-Cling and Cling including cuda extension with just a few comm…☆10Dec 22, 2020Updated 5 years ago
- Perplexity GPU Kernels☆564Nov 7, 2025Updated 3 months ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆165Feb 16, 2026Updated last week
- An efficient C++20 GPU numerical computing library with Python-like syntax☆1,405Updated this week
- KvikIO - High Performance File IO☆247Updated this week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆327Updated this week
- Fastest kernels written from scratch☆548Sep 18, 2025Updated 5 months ago
- A Datacenter Scale Distributed Inference Serving Framework☆6,117Updated this week
- Distributed Compiler based on Triton for Parallel Systems☆1,361Feb 13, 2026Updated 2 weeks ago
- CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-base…☆851Updated this week
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆942Aug 19, 2024Updated last year
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆353Dec 3, 2025Updated 2 months ago
- Example ML projects that use the Determined library.☆32Sep 11, 2024Updated last year
- CUDA Library Samples☆2,324Updated this week
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,261Aug 28, 2025Updated 6 months ago
- YAKL is A Kokkos Layer: A simple C++ framework for performance portability and Fortran code porting☆69Sep 9, 2025Updated 5 months ago