NVIDIA / cuda-pythonLinks
CUDA Python: Performance meets Productivity
☆2,704Updated this week
Alternatives and similar repositories for cuda-python
Users that are interested in cuda-python are comparing it to the libraries listed below
Sorting:
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…☆2,435Updated last week
- FlashInfer: Kernel Library for LLM Serving☆3,044Updated this week
- PyTorch native quantization and sparsity for training and inference☆2,072Updated this week
- CUDA Templates for Linear Algebra Subroutines☆7,603Updated this week
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆1,225Updated this week
- A Datacenter Scale Distributed Inference Serving Framework☆4,136Updated this week
- Tile primitives for speedy kernels☆2,399Updated this week
- A PyTorch native platform for training generative AI models☆3,868Updated this week
- CUDA Core Compute Libraries☆1,662Updated this week
- Efficient Triton Kernels for LLM Training☆5,120Updated this week
- nvidia-modelopt is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculat…☆942Updated last week
- PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT☆2,762Updated this week
- NVIDIA Math Libraries for the Python Ecosystem☆315Updated 2 months ago
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,197Updated this week
- Thunder gives you PyTorch models superpowers for training and inference. Unlock out-of-the-box optimizations for performance, memory and …☆1,350Updated this week
- GPU programming related news and material links☆1,527Updated 4 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,505Updated 2 months ago
- RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-a…☆889Updated this week
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆956Updated this week
- FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/☆1,342Updated this week
- NVIDIA curated collection of educational resources related to general purpose GPU programming.☆453Updated 2 weeks ago
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆3,350Updated this week
- Material for gpu-mode lectures☆4,501Updated 3 months ago
- Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.☆623Updated 2 weeks ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆1,417Updated this week
- PyTorch extensions for high performance and large scale training.☆3,322Updated last month
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.☆791Updated 3 months ago
- Transformer related optimization, including BERT, GPT☆6,173Updated last year
- CUDA Library Samples☆1,956Updated this week
- A Python package for extending the official PyTorch that can easily obtain performance on Intel platform☆1,857Updated this week