ROCm / pyrsmi
python package of rocm-smi-lib
☆20Updated 6 months ago
Alternatives and similar repositories for pyrsmi:
Users that are interested in pyrsmi are comparing it to the libraries listed below
- ☆68Updated 3 weeks ago
- Ahead of Time (AOT) Triton Math Library☆57Updated this week
- extensible collectives library in triton☆85Updated 2 weeks ago
- ☆27Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆105Updated 9 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated last month
- oneCCL Bindings for Pytorch*☆94Updated last week
- ☆77Updated 5 months ago
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- Perplexity GPU Kernels☆204Updated last week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆38Updated this week
- Benchmarks to capture important workloads.☆31Updated 2 months ago
- Home for OctoML PyTorch Profiler☆112Updated last year
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 8 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆72Updated 7 months ago
- ☆103Updated 7 months ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆108Updated 4 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆63Updated this week
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆110Updated this week
- Development repository for the Triton language and compiler☆118Updated this week
- RCCL Performance Benchmark Tests☆63Updated last week
- ☆21Updated last month
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month
- MLPerf™ logging library☆34Updated last week
- DeeperGEMM: crazy optimized version☆65Updated 2 weeks ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆172Updated 10 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆317Updated this week
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated last year
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆86Updated this week