CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) model created by NVIDIA for general computing on its own GPUs (Graphics Processing Units). It empowers application developers to leverage the parallel processing capabilities of NVIDIA's GPUs to accelerate computation-heavy tasks, such as matrix operations, physics simulations, deep learning training, and real-time video processing. CUDA provides a C-like programming language that allows developers to write kernel functions, which are executed on the GPU, and manage memory between the host (CPU) and device (GPU) environments. Utilizing CUDA can lead to significant performance improvements in suitable applications, and it integrates well with various programming environments, including Python through libraries like PyCUDA or through frameworks like TensorFlow with GPU support. Understanding basic concepts such as kernels, threads, blocks, and warps is essential for developers to effectively harness the power of GPU programming with CUDA.
View the most prominent open source CUDA projects in the list below. Click on a specific project to view its alternative or complementary packages.
- A high-throughput and memory-efficient inference and serving engine for LLMs☆30,423Updated this week
- World's fastest and most advanced password recovery utility☆21,350Updated 3 months ago
- Build and run Docker containers leveraging NVIDIA GPUs☆17,259Updated 11 months ago
- Instant neural graphics primitives: lightning fast NeRF and more☆16,038Updated last week
- kaldi-asr/kaldi is the official location of the Kaldi project.☆14,298Updated last month
- Open3D: A Modern Library for 3D Data Processing☆11,485Updated this week
- NumPy aware dynamic Python compiler using LLVM☆9,987Updated this week
- Solve puzzles. Learn CUDA.☆9,933Updated 2 months ago
- CUDA on non-NVIDIA GPUs☆9,759Updated this week
- NumPy & SciPy for GPU☆9,490Updated this week
- cuDF - GPU DataFrame Library☆8,451Updated this week
- A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other ma…☆8,092Updated this week
- Containers for machine learning☆8,090Updated this week
- Modular ZK(Zero Knowledge) backend accelerated by GPU☆7,777Updated 2 weeks ago
- Go package for computer vision using OpenCV 4 and beyond. Includes support for DNN, CUDA, OpenCV Contrib, and OpenVINO.☆6,722Updated this week
- Samples for CUDA Developers which demonstrates features in CUDA Toolkit☆6,458Updated 3 months ago
- OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.☆5,910Updated this week
- SGLang is a fast serving framework for large language models and vision language models.☆6,127Updated this week
- A flexible framework of neural networks for deep learning☆5,893Updated last year
- CUDA Templates for Linear Algebra Subroutines☆5,679Updated this week
- [ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl☆4,924Updated 9 months ago
- ALIEN is a CUDA-powered artificial life simulation program.☆4,946Updated this week
- An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.☆4,828Updated 3 weeks ago
- Tengine is a lite, high performance, modular inference engine for embedded device☆4,653Updated 2 months ago
- ArrayFire: a general purpose GPU library.☆4,567Updated 2 weeks ago
- A PyTorch Library for Accelerating 3D Deep Learning Research☆4,503Updated this week
- cuML - RAPIDS Machine Learning Library☆4,243Updated this week
- HIP: C++ Heterogeneous-Compute Interface for Portability☆3,763Updated this week
- Lightning fast C++/CUDA neural network framework☆3,765Updated 2 months ago
- Fast inference engine for Transformer models☆3,411Updated this week
- LightSeq: A High Performance Library for Sequence Processing and Generation☆3,210Updated last year
- Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer☆3,145Updated 2 months ago
- Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.☆3,096Updated this week
- A GPU-powered real-time analytics storage and query engine.☆3,032Updated 4 months ago
- HeavyDB (formerly OmniSciDB)☆2,956Updated 2 months ago
- A retargetable MLIR-based machine learning compiler and runtime toolkit.☆2,846Updated this week
- PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT☆2,597Updated this week
- Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors☆2,493Updated 8 months ago
- A data-parallel functional programming language☆2,407Updated this week
- CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.☆2,381Updated last month