NVIDIA / cuda-pythonLinks
CUDA Python: Performance meets Productivity
☆2,995Updated this week
Alternatives and similar repositories for cuda-python
Users that are interested in cuda-python are comparing it to the libraries listed below
Sorting:
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…☆2,763Updated this week
- PyTorch native quantization and sparsity for training and inference☆2,392Updated this week
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,583Updated this week
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆8,559Updated 2 weeks ago
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆3,197Updated this week
- A Datacenter Scale Distributed Inference Serving Framework☆5,244Updated this week
- Tile primitives for speedy kernels☆2,803Updated this week
- CUDA Core Compute Libraries☆1,964Updated this week
- CUDA Library Samples☆2,121Updated this week
- A Python framework for accelerated simulation, data generation and spatial computing.☆5,641Updated this week
- NVIDIA Math Libraries for the Python Ecosystem☆512Updated last month
- FlashInfer: Kernel Library for LLM Serving☆3,861Updated this week
- A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …☆1,431Updated this week
- CUDA integration for Python, plus shiny features☆1,994Updated last week
- PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT☆2,869Updated this week
- NVIDIA curated collection of educational resources related to general purpose GPU programming.☆735Updated this week
- A PyTorch native platform for training generative AI models☆4,504Updated this week
- NumPy and SciPy on Multi-Node Multi-GPU systems☆934Updated last week
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆623Updated 3 weeks ago
- PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily wri…☆1,413Updated this week
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆1,866Updated this week
- Learn CUDA Programming, published by Packt☆1,196Updated last year
- Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.☆2,124Updated last week
- CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.☆2,589Updated 4 months ago
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.☆1,062Updated last year
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.☆823Updated 2 months ago
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆456Updated 2 weeks ago
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX R…☆2,504Updated last week
- TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.☆987Updated last week
- This repository contains tutorials and examples for Triton Inference Server☆782Updated 2 weeks ago