ROCm / pyrsmiLinks
python package of rocm-smi-lib
☆24Updated last week
Alternatives and similar repositories for pyrsmi
Users that are interested in pyrsmi are comparing it to the libraries listed below
Sorting:
- Write a fast kernel and run it on Discord. See how you compare against the best!☆64Updated this week
- ☆71Updated 8 months ago
- ☆21Updated 9 months ago
- extensible collectives library in triton☆91Updated 8 months ago
- Ahead of Time (AOT) Triton Math Library☆84Updated last week
- Quantize transformers to any learned arbitrary 4-bit numeric format☆50Updated 5 months ago
- Ship correct and fast LLM kernels to PyTorch☆126Updated this week
- A Python library transfers PyTorch tensors between CPU and NVMe☆122Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆227Updated last year
- TORCH_LOGS parser for PT2☆69Updated last month
- 👷 Build compute kernels☆193Updated this week
- High-Performance SGEMM on CUDA devices☆113Updated 11 months ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆69Updated 8 months ago
- Prototype routines for GPU quantization written using PyTorch.☆21Updated 4 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆47Updated 4 months ago
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆40Updated 4 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆90Updated 3 months ago
- ☆81Updated 2 weeks ago
- ☆14Updated last month
- Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools☆71Updated this week
- A bunch of kernels that might make stuff slower 😉☆65Updated 2 weeks ago
- Fast low-bit matmul kernels in Triton☆410Updated this week
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆320Updated this week
- Home for OctoML PyTorch Profiler☆114Updated 2 years ago
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆133Updated this week
- Explore training for quantized models☆25Updated 5 months ago
- Parallel framework for training and fine-tuning deep neural networks☆70Updated last month
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆84Updated last month
- ☆97Updated last year
- ☆113Updated last month