NVIDIA / cuda-pythonLinks
CUDA Python: Performance meets Productivity
☆2,827Updated last week
Alternatives and similar repositories for cuda-python
Users that are interested in cuda-python are comparing it to the libraries listed below
Sorting:
- PyTorch native quantization and sparsity for training and inference☆2,191Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…☆2,561Updated this week
- CUDA Templates for Linear Algebra Subroutines☆7,941Updated last week
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,349Updated this week
- A Datacenter Scale Distributed Inference Serving Framework☆4,496Updated this week
- CUDA Core Compute Libraries☆1,761Updated this week
- Tile primitives for speedy kernels☆2,517Updated this week
- An Aspiring Drop-In Replacement for NumPy at Scale☆904Updated this week
- NVIDIA curated collection of educational resources related to general purpose GPU programming.☆574Updated last week
- FlashInfer: Kernel Library for LLM Serving☆3,380Updated this week
- CUDA Library Samples☆2,018Updated last week
- Optimized primitives for collective multi-GPU communication☆3,865Updated last week
- A PyTorch native platform for training generative AI models☆4,056Updated this week
- NVIDIA Math Libraries for the Python Ecosystem☆333Updated last week
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,012Updated this week
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,588Updated last week
- CUDA integration for Python, plus shiny features☆1,967Updated last month
- A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …☆1,052Updated this week
- Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA☆1,574Updated this week
- PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily wri…☆1,380Updated this week
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆1,420Updated this week
- Supercharge Your LLM with the Fastest KV Cache Layer☆3,080Updated this week
- RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-a…☆909Updated last week
- NCCL Tests☆1,177Updated last month
- GPU programming related news and material links☆1,616Updated 6 months ago
- A Python framework for accelerated simulation, data generation and spatial computing.☆5,299Updated this week
- PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT☆2,800Updated last week
- Puzzles for learning Triton☆1,760Updated 8 months ago
- This repository contains tutorials and examples for Triton Inference Server☆735Updated last month
- FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/☆1,400Updated this week