NVIDIA / cuda-python
CUDA Python: Performance meets Productivity
☆2,284Updated this week
Alternatives and similar repositories for cuda-python:
Users that are interested in cuda-python are comparing it to the libraries listed below
- PyTorch native quantization and sparsity for training and inference☆1,954Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…☆2,355Updated this week
- CUDA Core Compute Libraries☆1,591Updated this week
- FlashInfer: Kernel Library for LLM Serving☆2,659Updated last week
- CUDA integration for Python, plus shiny features☆1,923Updated 2 months ago
- nvidia-modelopt is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculat…☆860Updated 2 weeks ago
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆864Updated 2 weeks ago
- A Datacenter Scale Distributed Inference Serving Framework☆3,684Updated this week
- A PyTorch native library for large model training☆3,587Updated this week
- An Aspiring Drop-In Replacement for NumPy at Scale☆870Updated this week
- Thunder gives you PyTorch models superpowers for training and inference. Unlock out-of-the-box optimizations for performance, memory and …☆1,323Updated this week
- Pipeline Parallelism for PyTorch☆762Updated 7 months ago
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆369Updated last week
- CUDA Kernel Benchmarking Library☆618Updated this week
- Efficient Triton Kernels for LLM Training☆4,836Updated this week
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,087Updated this week
- common in-memory tensor structure☆978Updated last week
- NVIDIA Math Libraries for the Python Ecosystem☆281Updated last month
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆971Updated this week
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.☆1,038Updated last year
- Tile primitives for speedy kernels☆2,259Updated this week
- CUDA Templates for Linear Algebra Subroutines☆7,294Updated last week
- A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.☆794Updated this week
- PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT☆2,725Updated this week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆1,213Updated this week
- PyTorch extensions for high performance and large scale training.☆3,298Updated last week
- NCCL Tests☆1,065Updated last month
- Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA☆798Updated this week
- Puzzles for learning Triton☆1,566Updated 4 months ago
- A Python package for extending the official PyTorch that can easily obtain performance on Intel platform☆1,825Updated last week