NVIDIA curated collection of educational resources related to general purpose GPU programming.
☆1,325Mar 15, 2026Updated this week
Alternatives and similar repositories for accelerated-computing-hub
Users that are interested in accelerated-computing-hub are comparing it to the libraries listed below
Sorting:
- RAPIDS Deployment Documentation☆15Mar 11, 2026Updated last week
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆57Updated this week
- CUDA Core Compute Libraries☆2,217Updated this week
- CUDA Python: Performance meets Productivity☆3,185Updated this week
- NVIDIA Math Libraries for the Python Ecosystem☆553Mar 11, 2026Updated last week
- Some CUDA example code with READMEs.☆180Nov 11, 2025Updated 4 months ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆70Apr 14, 2025Updated 11 months ago
- CUDA Kernel Benchmarking Library☆831Updated this week
- ☆627Mar 12, 2026Updated last week
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆874Sep 26, 2025Updated 5 months ago
- The CUDA target for Numba☆263Updated this week
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆9,442Updated this week
- Material for gpu-mode lectures☆5,841Feb 1, 2026Updated last month
- Fast low-bit matmul kernels in Triton☆438Feb 1, 2026Updated last month
- GPU programming related news and material links☆2,047Mar 8, 2026Updated last week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆252May 6, 2025Updated 10 months ago
- An efficient C++20 GPU numerical computing library with Python-like syntax☆1,406Updated this week
- LLVM/MLIR based compiler instrumentation of AMD GPU kernels☆20Jul 13, 2025Updated 8 months ago
- RAPIDS Memory Manager☆685Updated this week
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆477Mar 10, 2026Updated last week
- Helpful kernel tutorials and examples for tile-based GPU programming☆675Updated this week
- KvikIO - High Performance File IO☆255Updated this week
- Pseudo-spectral code for DNS of Homogenous isotropic turbulence. Scalars and particles are also supported.☆11Oct 19, 2023Updated 2 years ago
- Dragon distributed runtime for HPC and AI applications and workflows☆89Mar 3, 2026Updated 2 weeks ago
- ☆65Apr 26, 2025Updated 10 months ago
- Offline as of 2026-03-13☆15Mar 13, 2026Updated last week
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆466Mar 10, 2025Updated last year
- Fastest kernels written from scratch☆559Sep 18, 2025Updated 6 months ago
- CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-base…☆874Updated this week
- LLM training in simple, raw C/CUDA☆112May 1, 2024Updated last year
- Samples for CUDA Developers which demonstrates features in CUDA Toolkit☆8,953Jan 6, 2026Updated 2 months ago
- How to call NVTX from Fortran☆12Jun 25, 2025Updated 8 months ago
- CUDA Library Samples☆2,346Updated this week
- Kokkos C++ Performance Portability Programming Ecosystem: Profiling and Debugging Tools☆140Updated this week
- ☆307Updated this week
- A Datacenter Scale Distributed Inference Serving Framework☆6,250Updated this week
- A Python framework for accelerated simulation, data generation and spatial computing.☆6,326Updated this week
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆947Aug 19, 2024Updated last year
- A lightweight design for computation-communication overlap.☆225Jan 20, 2026Updated 2 months ago