NVIDIA / cuda-pythonLinks
CUDA Python: Performance meets Productivity
☆2,790Updated this week
Alternatives and similar repositories for cuda-python
Users that are interested in cuda-python are comparing it to the libraries listed below
Sorting:
- PyTorch native quantization and sparsity for training and inference☆2,125Updated this week
- CUDA Core Compute Libraries☆1,711Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…☆2,507Updated last week
- FlashInfer: Kernel Library for LLM Serving☆3,239Updated this week
- A PyTorch native platform for training generative AI models☆3,953Updated this week
- CUDA integration for Python, plus shiny features☆1,957Updated 2 weeks ago
- Tile primitives for speedy kernels☆2,478Updated this week
- CUDA Templates for Linear Algebra Subroutines☆7,754Updated this week
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX R…☆2,434Updated this week
- A Datacenter Scale Distributed Inference Serving Framework☆4,326Updated this week
- An efficient C++17 GPU numerical computing library with Python-like syntax☆1,332Updated this week
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆3,558Updated this week
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,554Updated 3 weeks ago
- Optimized primitives for collective multi-GPU communication☆3,798Updated last week
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,294Updated this week
- PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT☆2,785Updated this week
- An Aspiring Drop-In Replacement for NumPy at Scale☆902Updated last week
- Thunder gives you PyTorch models superpowers for training and inference. Unlock out-of-the-box optimizations for performance, memory and …☆1,367Updated this week
- FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/☆1,385Updated this week
- NVIDIA Math Libraries for the Python Ecosystem☆330Updated 2 weeks ago
- A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …☆1,006Updated last week
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆1,314Updated this week
- High-efficiency floating-point neural network inference operators for mobile, server, and Web☆2,051Updated this week
- CUDA Library Samples☆1,993Updated last week
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆402Updated 3 weeks ago
- RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-a…☆903Updated this week
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆5,468Updated this week
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,596Updated last month
- NumPy & SciPy for GPU☆10,293Updated this week
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆986Updated 3 weeks ago