NVIDIA / cuda-python
CUDA Python Low-level Bindings
☆882Updated this week
Related projects ⓘ
Alternatives and complementary repositories for cuda-python
- An Aspiring Drop-In Replacement for NumPy at Scale☆620Updated last month
- common in-memory tensor structure☆905Updated last month
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆301Updated this week
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆553Updated last week
- NVIDIA Math Libraries for the Python Ecosystem☆202Updated 4 months ago
- CUDA Kernel Benchmarking Library☆512Updated 2 weeks ago
- The Foundation for All Legate Libraries☆189Updated last month
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.☆1,010Updated 6 months ago
- RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-a…☆766Updated this week
- CUDA Core Compute Libraries☆1,246Updated this week
- CUDA integration for Python, plus shiny features☆1,849Updated this week
- [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl☆1,679Updated last year
- RAPIDS Memory Manager☆492Updated this week
- ☆482Updated this week
- CUDA Library Samples☆1,600Updated this week
- An efficient C++17 GPU numerical computing library with Python-like syntax☆1,211Updated this week
- A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.☆727Updated this week
- A GPU performance profiling tool for PyTorch models☆493Updated 3 years ago
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.☆739Updated this week
- functorch is JAX-like composable function transforms for PyTorch.☆1,395Updated this week
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆446Updated 2 weeks ago
- KvikIO - High Performance File IO☆156Updated this week
- A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python☆311Updated last month
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆426Updated this week
- An open-source efficient deep learning framework/compiler, written in python.☆649Updated this week
- TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillati…☆534Updated this week
- Container plugin for Slurm Workload Manager☆287Updated this week
- Pipeline Parallelism for PyTorch☆725Updated 2 months ago
- PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT☆2,585Updated this week
- HIPIFY: Convert CUDA to Portable C++ Code☆523Updated this week