CUDA

CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) model created by NVIDIA for general computing on its own GPUs (Graphics Processing Units). It empowers application developers to leverage the parallel processing capabilities of NVIDIA's GPUs to accelerate computation-heavy tasks, such as matrix operations, physics simulations, deep learning training, and real-time video processing. CUDA provides a C-like programming language that allows developers to write kernel functions, which are executed on the GPU, and manage memory between the host (CPU) and device (GPU) environments. Utilizing CUDA can lead to significant performance improvements in suitable applications, and it integrates well with various programming environments, including Python through libraries like PyCUDA or through frameworks like TensorFlow with GPU support. Understanding basic concepts such as kernels, threads, blocks, and warps is essential for developers to effectively harness the power of GPU programming with CUDA.

View the most prominent open source CUDA projects in the list below. Click on a specific project to view its alternative or complementary packages. Make comparisons and find the best package for your app.

Popular CUDA repositories:

vllm-project / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆51,794Updated this week
hashcat / hashcat
World's fastest and most advanced password recovery utility
☆22,952Updated this week
NVIDIA / nvidia-docker
Build and run Docker containers leveraging NVIDIA GPUs
☆17,394Updated last year
NVlabs / instant-ngp
Instant neural graphics primitives: lightning fast NeRF and more
☆16,746Updated this week
kaldi-asr / kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
☆14,971Updated 2 months ago
sgl-project / sglang
SGLang is a fast serving framework for large language models and vision language models.
☆15,747Updated this week
isl-org / Open3D
Open3D: A Modern Library for 3D Data Processing
☆12,522Updated last week
vosen / ZLUDA
CUDA on non-NVIDIA GPUs
☆11,911Updated last week
tracel-ai / burn
Burn is a next generation Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
☆11,480Updated last week
srush / GPU-Puzzles
Solve puzzles. Learn CUDA.
☆11,230Updated 10 months ago
numba / numba
NumPy aware dynamic Python compiler using LLVM
☆10,500Updated last week
cupy / cupy
NumPy & SciPy for GPU
☆10,323Updated this week
rapidsai / cudf
cuDF - GPU DataFrame Library
☆9,033Updated this week
replicate / cog
Containers for machine learning
☆8,704Updated this week
Oneflow-Inc / oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
☆9,350Updated last week
catboost / catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other ma…
☆8,463Updated this week
kroma-network / tachyon
Modular ZK(Zero Knowledge) backend accelerated by GPU
☆7,762Updated 7 months ago
NVIDIA / cutlass
CUDA Templates for Linear Algebra Subroutines
☆7,808Updated this week
NVIDIA / cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
☆7,711Updated last month
hybridgroup / gocv
Go package for computer vision using OpenCV 4 and beyond. Includes support for DNN, CUDA, OpenCV Contrib, and OpenVINO.
☆7,130Updated this week
chainer / chainer
A flexible framework of neural networks for deep learning
☆5,909Updated last year
XuehaiPan / nvitop
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
☆5,706Updated this week
chrxh / alien
ALIEN is a CUDA-powered artificial life simulation program.
☆5,196Updated this week
NVIDIA / warp
A Python framework for accelerated simulation, data generation and spatial computing.
☆5,276Updated this week
NVIDIA / thrust
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
☆4,977Updated last year
NVIDIAGameWorks / kaolin
A PyTorch Library for Accelerating 3D Deep Learning Research
☆4,826Updated last month
rapidsai / cuml
cuML - RAPIDS Machine Learning Library
☆4,810Updated this week
arrayfire / arrayfire
ArrayFire: a general purpose GPU library.
☆4,737Updated this week
OAID / Tengine
Tengine is a lite, high performance, modular inference engine for embedded device
☆4,477Updated 4 months ago
Rust-GPU / Rust-CUDA
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
☆4,521Updated this week
xlite-dev / LeetCUDA
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
☆5,430Updated last week
NVlabs / tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
☆4,112Updated this week
ROCm / hip
HIP: C++ Heterogeneous-Compute Interface for Portability
☆4,111Updated this week
shader-slang / slang
Making it easier to work with shaders
☆4,218Updated this week
OpenNMT / CTranslate2
Fast inference engine for Transformer models
☆3,902Updated 3 months ago
bytedance / lightseq
LightSeq: A High Performance Library for Sequence Processing and Generation
☆3,282Updated 2 years ago
Celtoys / Remotery
Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer
☆3,243Updated 10 months ago
Jittor / jittor
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
☆3,181Updated 3 weeks ago
iree-org / iree
A retargetable MLIR-based machine learning compiler and runtime toolkit.
☆3,209Updated this week
uber / aresdb
A GPU-powered real-time analytics storage and query engine.
☆3,057Updated 11 months ago