ROCm / pyrsmi
python package of rocm-smi-lib
☆20Updated 4 months ago
Alternatives and similar repositories for pyrsmi:
Users that are interested in pyrsmi are comparing it to the libraries listed below
- ☆59Updated 2 weeks ago
- Development repository for the Triton language and compiler☆108Updated this week
- RCCL Performance Benchmark Tests☆59Updated last month
- ☆34Updated this week
- ☆18Updated this week
- ☆67Updated 3 months ago
- Ahead of Time (AOT) Triton Math Library☆53Updated this week
- extensible collectives library in triton☆83Updated 4 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 6 months ago
- Fast low-bit matmul kernels in Triton☆238Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆18Updated this week
- Bandwidth test for ROCm☆54Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆79Updated this week
- Benchmarks to capture important workloads.☆29Updated 3 weeks ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆171Updated this week
- Applied AI experiments and examples for PyTorch☆225Updated this week
- ☆100Updated 5 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆97Updated 7 months ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated 2 months ago
- ☆19Updated this week
- Boosting 4-bit inference kernels with 2:4 Sparsity☆64Updated 5 months ago
- High-Performance SGEMM on CUDA devices☆76Updated last month
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆38Updated 2 months ago
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆58Updated last month
- Framework to reduce autotune overhead to zero for well known deployments.☆61Updated 3 weeks ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆77Updated this week
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- A Data-Centric Compiler for Machine Learning☆82Updated last year
- Collection of kernels written in Triton language☆105Updated this week