ROCm / pyrsmi
python package of rocm-smi-lib
☆18Updated last month
Related projects ⓘ
Alternatives and complementary repositories for pyrsmi
- ☆55Updated 5 months ago
- extensible collectives library in triton☆72Updated last month
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 10 months ago
- ☆45Updated 2 weeks ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆90Updated 4 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆50Updated this week
- ☆88Updated 2 months ago
- Ahead of Time (AOT) Triton Math Library☆41Updated this week
- Efficient, Flexible and Portable Structured Generation☆53Updated this week
- Framework to reduce autotune overhead to zero for well known deployments.☆20Updated this week
- Development repository for the Triton language and compiler☆93Updated this week
- PyTorch bindings for CUTLASS grouped GEMM.☆53Updated 3 weeks ago
- Applied AI experiments and examples for PyTorch☆166Updated 3 weeks ago
- MLPerf™ logging library☆30Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆35Updated 6 months ago
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆35Updated 4 months ago
- ☆12Updated last month
- Memory Optimizations for Deep Learning (ICML 2023)☆60Updated 8 months ago
- ☆99Updated last month
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆57Updated 2 months ago
- ☆30Updated this week
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆38Updated 3 weeks ago
- ☆47Updated 2 months ago
- GPTQ inference TVM kernel☆36Updated 6 months ago
- TensorRT LLM Benchmark Configuration☆11Updated 3 months ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆34Updated 2 years ago
- ☆48Updated 8 months ago
- Simple and fast low-bit matmul kernels in CUDA / Triton☆145Updated this week
- RCCL Performance Benchmark Tests☆50Updated 3 weeks ago