ROCm / pyrsmi
python package of rocm-smi-lib
☆19Updated 3 months ago
Alternatives and similar repositories for pyrsmi:
Users that are interested in pyrsmi are comparing it to the libraries listed below
- ☆57Updated 7 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆57Updated 2 months ago
- Fast low-bit matmul kernels in Triton☆187Updated last week
- Boosting 4-bit inference kernels with 2:4 Sparsity☆64Updated 4 months ago
- extensible collectives library in triton☆76Updated 3 months ago
- AMD SMI☆46Updated this week
- ☆21Updated 2 months ago
- Development repository for the Triton language and compiler☆102Updated this week
- Ahead of Time (AOT) Triton Math Library☆50Updated this week
- ☆64Updated 2 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆38Updated 8 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆64Updated this week
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- ☆96Updated 4 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆75Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆165Updated this week
- The "Kaggle" for Kernel Developers☆15Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆93Updated 6 months ago
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆230Updated 2 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- Cataloging released Triton kernels.☆156Updated last week
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆38Updated last month
- Applied AI experiments and examples for PyTorch☆211Updated this week
- MLPerf™ logging library☆32Updated last week
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆219Updated this week
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆152Updated last month
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆39Updated 2 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 5 months ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated last month