amd / ZenDNN-pytorch-pluginLinks
☆25Updated last week
Alternatives and similar repositories for ZenDNN-pytorch-plugin
Users that are interested in ZenDNN-pytorch-plugin are comparing it to the libraries listed below
Sorting:
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆68Updated 2 weeks ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆41Updated last year
- ☆59Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆63Updated 3 months ago
- RCCL Performance Benchmark Tests☆75Updated last week
- Development repository for the Triton language and compiler☆135Updated this week
- Ongoing research training transformer models at scale☆29Updated this week
- AI Tensor Engine for ROCm☆285Updated last week
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆33Updated last month
- ☆48Updated this week
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆67Updated 3 weeks ago
- oneCCL Bindings for Pytorch*☆102Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆83Updated this week
- OpenAI Triton backend for Intel® GPUs☆211Updated this week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆91Updated this week
- Collection of kernels written in Triton language☆156Updated 6 months ago
- ☆63Updated 10 months ago
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆23Updated 6 months ago
- SYCL* Templates for Linear Algebra (SYCL*TLA) - SYCL based CUTLASS implementation for Intel GPUs☆41Updated this week
- ☆240Updated this week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆265Updated 3 months ago
- CUDA GPU Benchmark☆33Updated 8 months ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆99Updated last week
- An experimental CPU backend for Triton☆153Updated this week
- Ahead of Time (AOT) Triton Math Library☆79Updated this week
- Fast and memory-efficient exact attention☆193Updated this week
- ☆27Updated 3 weeks ago
- Fastest kernels written from scratch☆374Updated last month
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆278Updated last week
- Cataloging released Triton kernels.☆263Updated last month