pytorch-labs / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆38Updated 9 months ago
Alternatives and similar repositories for triton-cpu:
Users that are interested in triton-cpu are comparing it to the libraries listed below
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆100Updated this week
- MLIR-based partitioning system☆62Updated this week
- ☆67Updated 3 months ago
- extensible collectives library in triton☆83Updated 4 months ago
- An experimental CPU backend for Triton☆88Updated this week
- A lightweight, Pythonic, frontend for MLIR☆80Updated last year
- ☆48Updated 11 months ago
- ☆72Updated 2 months ago
- TPP experimentation on MLIR for linear algebra☆119Updated this week
- Ahead of Time (AOT) Triton Math Library☆52Updated this week
- An extension library of WMMA API (Tensor Core API)☆88Updated 7 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆89Updated this week
- CUDA Templates for Linear Algebra Subroutines☆14Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆97Updated 7 months ago
- The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.☆79Updated this week
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆130Updated this week
- ☆34Updated this week
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 11 months ago
- End to End steps for adding custom ops in PyTorch.☆20Updated 4 years ago
- Shared Middle-Layer for Triton Compilation☆226Updated this week
- ☆137Updated this week
- TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.☆56Updated this week
- ☆44Updated last month
- ☆87Updated 10 months ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆105Updated 2 months ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆71Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆127Updated last year
- ☆17Updated this week
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 6 months ago