CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) model created by NVIDIA for general computing on its own GPUs (Graphics Processing Units). It empowers application developers to leverage the parallel processing capabilities of NVIDIA's GPUs to accelerate computation-heavy tasks, such as matrix operations, physics simulations, deep learning training, and real-time video processing. CUDA provides a C-like programming language that allows developers to write kernel functions, which are executed on the GPU, and manage memory between the host (CPU) and device (GPU) environments. Utilizing CUDA can lead to significant performance improvements in suitable applications, and it integrates well with various programming environments, including Python through libraries like PyCUDA or through frameworks like TensorFlow with GPU support. Understanding basic concepts such as kernels, threads, blocks, and warps is essential for developers to effectively harness the power of GPU programming with CUDA.
View the most prominent open source CUDA projects in the list below. Click on a specific project to view its alternative or complementary packages. Make comparisons and find the best package for your app.
- A high-throughput and memory-efficient inference and serving engine for LLMs☆51,794Updated this week
- World's fastest and most advanced password recovery utility☆22,952Updated this week
- Build and run Docker containers leveraging NVIDIA GPUs☆17,394Updated last year
- Instant neural graphics primitives: lightning fast NeRF and more☆16,746Updated this week
- kaldi-asr/kaldi is the official location of the Kaldi project.☆14,971Updated 2 months ago
- SGLang is a fast serving framework for large language models and vision language models.☆15,747Updated this week
- Open3D: A Modern Library for 3D Data Processing☆12,522Updated last week
- CUDA on non-NVIDIA GPUs☆11,911Updated last week
- Burn is a next generation Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.☆11,480Updated last week
- Solve puzzles. Learn CUDA.☆11,230Updated 10 months ago
- NumPy aware dynamic Python compiler using LLVM☆10,500Updated last week
- NumPy & SciPy for GPU☆10,323Updated this week
- cuDF - GPU DataFrame Library☆9,033Updated this week
- Containers for machine learning☆8,704Updated this week
- OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.☆9,350Updated last week
- A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other ma…☆8,463Updated this week
- Modular ZK(Zero Knowledge) backend accelerated by GPU☆7,762Updated 7 months ago
- CUDA Templates for Linear Algebra Subroutines☆7,808Updated this week
- Samples for CUDA Developers which demonstrates features in CUDA Toolkit☆7,711Updated last month
- Go package for computer vision using OpenCV 4 and beyond. Includes support for DNN, CUDA, OpenCV Contrib, and OpenVINO.☆7,130Updated this week
- A flexible framework of neural networks for deep learning☆5,909Updated last year
- An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.☆5,706Updated this week
- ALIEN is a CUDA-powered artificial life simulation program.☆5,196Updated this week
- A Python framework for accelerated simulation, data generation and spatial computing.☆5,276Updated this week
- [ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl☆4,977Updated last year
- A PyTorch Library for Accelerating 3D Deep Learning Research☆4,826Updated last month
- cuML - RAPIDS Machine Learning Library☆4,810Updated this week
- ArrayFire: a general purpose GPU library.☆4,737Updated this week
- Tengine is a lite, high performance, modular inference engine for embedded device☆4,477Updated 4 months ago
- Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.☆4,521Updated this week
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆5,430Updated last week
- Lightning fast C++/CUDA neural network framework☆4,112Updated this week
- HIP: C++ Heterogeneous-Compute Interface for Portability☆4,111Updated this week
- Making it easier to work with shaders☆4,218Updated this week
- Fast inference engine for Transformer models☆3,902Updated 3 months ago
- LightSeq: A High Performance Library for Sequence Processing and Generation☆3,282Updated 2 years ago
- Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer☆3,243Updated 10 months ago
- Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.☆3,181Updated 3 weeks ago
- A retargetable MLIR-based machine learning compiler and runtime toolkit.☆3,209Updated this week
- A GPU-powered real-time analytics storage and query engine.☆3,057Updated 11 months ago