Tiramisu-Compiler / tiramisu_pytorch
Integration of Tiramisu (Compiler) into PyTorch
☆26Updated 4 years ago
Alternatives and similar repositories for tiramisu_pytorch:
Users that are interested in tiramisu_pytorch are comparing it to the libraries listed below
- Hybrid Tiny Hardware-aware Neural Architecture Search☆15Updated 2 years ago
- HW-PR-NAS is a single surrogate model trained to Pareto rank the architectures based on Accuracy, Latency and energy consumption☆12Updated 2 years ago
- A polyhedral compiler for expressing fast and portable data parallel algorithms☆927Updated 2 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 6 months ago
- ☆30Updated 4 years ago
- GEMM and Winograd based convolutions using CUTLASS☆26Updated 4 years ago
- A self-contained version of the tutorial which can be easily cloned and viewed by others.☆24Updated 5 years ago
- CUDA Matrix Multiplication Optimization☆155Updated 6 months ago
- ☆16Updated 5 years ago
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018☆72Updated 4 years ago
- Benchmarks to capture important workloads.☆29Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆38Updated 8 months ago
- CUDA templates for tile-sparse matrix multiplication based on CUTLASS.☆49Updated 6 years ago
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆133Updated last year
- A Deep Learning Meta-Framework and HPC Benchmarking Library☆81Updated 2 years ago
- ☆12Updated 3 years ago
- PyTorch interface for the IPU☆177Updated last year
- Conversions to MLIR EmitC☆126Updated last month
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 10 months ago
- An MLIR frontend for tensor expressions☆24Updated 4 years ago
- A Data-Centric Compiler for Machine Learning☆82Updated last year
- ☆16Updated 2 years ago
- Fast sparse deep learning on CPUs☆52Updated 2 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆74Updated last year
- Benchmark code for the "Online normalizer calculation for softmax" paper☆65Updated 6 years ago
- A Winograd Minimal Filter Implementation in CUDA☆23Updated 3 years ago
- Issues related to MLPerf™ training policies, including rules and suggested changes☆94Updated 2 months ago
- Stores documents and resources used by the OpenXLA developer community☆114Updated 5 months ago
- Training material for IPU users: tutorials, feature examples, simple applications☆86Updated last year
- ParaDnn: A systematic performance analysis methodology for deep learning.☆38Updated 4 years ago