ROCm / pyrsmi
python package of rocm-smi-lib
☆20Updated 5 months ago
Alternatives and similar repositories for pyrsmi:
Users that are interested in pyrsmi are comparing it to the libraries listed below
- ☆62Updated 3 weeks ago
- ☆25Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆39Updated this week
- extensible collectives library in triton☆84Updated 6 months ago
- Ahead of Time (AOT) Triton Math Library☆55Updated this week
- ☆101Updated 6 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆101Updated 8 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆71Updated 6 months ago
- Development repository for the Triton language and compiler☆114Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆34Updated this week
- Benchmarks to capture important workloads.☆30Updated last month
- Framework to reduce autotune overhead to zero for well known deployments.☆63Updated this week
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- Fast low-bit matmul kernels in Triton☆267Updated this week
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 7 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆104Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆91Updated this week
- ☆73Updated 4 months ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆152Updated 10 months ago
- RCCL Performance Benchmark Tests☆60Updated last week
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆58Updated 2 months ago
- ☆21Updated last month
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs☆111Updated last year
- ☆55Updated 2 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 2 weeks ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆177Updated this week
- Home for OctoML PyTorch Profiler☆108Updated last year
- oneCCL Bindings for Pytorch*☆90Updated last week