ROCm / pyrsmiLinks
python package of rocm-smi-lib
☆24Updated 3 months ago
Alternatives and similar repositories for pyrsmi
Users that are interested in pyrsmi are comparing it to the libraries listed below
Sorting:
- ☆71Updated 7 months ago
- extensible collectives library in triton☆90Updated 7 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆61Updated last week
- ☆51Updated this week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆101Updated this week
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆40Updated 3 months ago
- How to ensure correctness and ship LLM generated kernels in PyTorch☆114Updated last week
- High-Performance SGEMM on CUDA devices☆109Updated 9 months ago
- Ahead of Time (AOT) Triton Math Library☆81Updated this week
- ☆21Updated 8 months ago
- ☆93Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆223Updated last year
- Triton-based Symmetric Memory operators and examples☆61Updated 3 weeks ago
- Explore training for quantized models☆25Updated 3 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆187Updated this week
- A bunch of kernels that might make stuff slower 😉☆64Updated last week
- ☆218Updated 9 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆47Updated 2 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆85Updated last year
- Parallel framework for training and fine-tuning deep neural networks☆65Updated 2 weeks ago
- Framework to reduce autotune overhead to zero for well known deployments.☆85Updated last month
- Official implementation for Training LLMs with MXFP4☆101Updated 6 months ago
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆125Updated last week
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆41Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆120Updated last year
- Experiment of using Tangent to autodiff triton☆79Updated last year
- Development repository for the Triton language and compiler☆136Updated this week
- Automatic differentiation for Triton Kernels☆28Updated 2 months ago
- Fast low-bit matmul kernels in Triton☆392Updated 2 weeks ago