CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) model created by NVIDIA for general computing on its own GPUs (Graphics Processing Units). It empowers application developers to leverage the parallel processing capabilities of NVIDIA's GPUs to accelerate computation-heavy tasks, such as matrix operations, physics simulations, deep learning training, and real-time video processing. CUDA provides a C-like programming language that allows developers to write kernel functions, which are executed on the GPU, and manage memory between the host (CPU) and device (GPU) environments. Utilizing CUDA can lead to significant performance improvements in suitable applications, and it integrates well with various programming environments, including Python through libraries like PyCUDA or through frameworks like TensorFlow with GPU support. Understanding basic concepts such as kernels, threads, blocks, and warps is essential for developers to effectively harness the power of GPU programming with CUDA.
View the most prominent open source CUDA projects in the list below. Click on a specific project to view its alternative or complementary packages. Make comparisons and find the best package for your app.
- A high-throughput and memory-efficient inference and serving engine for LLMs☆80,418Updated this week
- SGLang is a high-performance serving framework for large language models and multimodal models.☆27,836May 15, 2026Updated last week
- World's fastest and most advanced password recovery utility☆25,950Feb 20, 2026Updated 3 months ago
- The open-source AI voice studio. Clone, dictate, create.☆27,021Apr 26, 2026Updated 3 weeks ago
- Build and run Docker containers leveraging NVIDIA GPUs☆17,544Dec 6, 2023Updated 2 years ago
- Instant neural graphics primitives: lightning fast NeRF and more☆17,401Feb 2, 2026Updated 3 months ago
- kaldi-asr/kaldi is the official location of the Kaldi project.☆15,393Sep 22, 2025Updated 8 months ago
- Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.☆15,136May 15, 2026Updated last week
- CUDA on non-NVIDIA GPUs☆14,198May 14, 2026Updated last week
- Open3D: A Modern Library for 3D Data Processing☆13,595Updated this week
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…☆13,669Updated this week
- Solve puzzles. Learn CUDA.☆12,144Sep 1, 2024Updated last year
- NumPy aware dynamic Python compiler using LLVM☆11,023May 14, 2026Updated last week
- NumPy & SciPy for GPU☆10,946May 16, 2026Updated last week
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆11,050Updated this week
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆9,731May 13, 2026Updated last week
- cuDF - GPU DataFrame Library☆9,636Updated this week
- Containers for machine learning☆9,412Updated this week
- OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.☆9,395Dec 4, 2025Updated 5 months ago
- Samples for CUDA Developers which demonstrates features in CUDA Toolkit☆9,175May 13, 2026Updated last week
- A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other ma…☆8,952Updated this week
- Supercharge Your LLM with the Fastest KV Cache Layer☆8,282Updated this week
- Modular ZK(Zero Knowledge) backend accelerated by GPU☆7,679Nov 29, 2024Updated last year
- Go package for computer vision using OpenCV 4 and beyond. Includes support for DNN, CUDA, OpenCV Contrib, and OpenVINO.☆7,440Apr 24, 2026Updated 3 weeks ago
- An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.☆6,912May 16, 2026Updated last week
- A Python framework for GPU-accelerated simulation, robotics, and machine learning.☆6,666Updated this week
- A flexible framework of neural networks for deep learning☆5,917Aug 28, 2023Updated 2 years ago
- FlashInfer: Kernel Library for LLM Serving☆5,621May 16, 2026Updated last week
- ALIEN is a CUDA-powered artificial life simulation program.☆5,409Updated this week
- Making it easier to work with shaders☆5,294May 15, 2026Updated last week
- cuML - RAPIDS Machine Learning Library☆5,195Updated this week
- Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.☆5,201Apr 29, 2026Updated 3 weeks ago
- A PyTorch Library for Accelerating 3D Deep Learning Research☆5,087May 13, 2026Updated last week
- [ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl☆5,002Feb 8, 2024Updated 2 years ago
- ArrayFire: a general purpose GPU library.☆4,885Mar 7, 2026Updated 2 months ago
- A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.☆5,019Updated this week
- Optimized primitives for collective multi-GPU communication☆4,729Updated this week
- Tengine is a lite, high performance, modular inference engine for embedded device☆4,523Mar 6, 2025Updated last year
- Lightning fast C++/CUDA neural network framework☆4,485Apr 21, 2026Updated last month
- Fast inference engine for Transformer models☆4,491Updated this week