brightlaboratory / polydl
☆12Updated 3 years ago
Alternatives and similar repositories for polydl:
Users that are interested in polydl are comparing it to the libraries listed below
- ☆15Updated 3 years ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆74Updated this week
- A novel spatial accelerator for horizontal diffusion weather stencil computation, as described in ICS 2023 paper by Singh et al. (https:/…☆18Updated last year
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Updated 5 years ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 8 months ago
- A simulation framework for modeling efficiency of Graph Neural Network Dataflows☆22Updated last month
- ☆39Updated 5 years ago
- A Data-Centric Compiler for Machine Learning☆82Updated last year
- ☆30Updated 2 years ago
- ☆21Updated last month
- Code base for OOPSLA'24 paper: UniSparse: An Intermediate Language for General Sparse Format Customization☆30Updated 4 months ago
- [ICML 2021] "Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators" by Yonggan Fu, Yonga…☆15Updated 3 years ago
- HW/SW co-design of sentence-level energy optimizations for latency-aware multi-task NLP inference☆46Updated last year
- ☆24Updated last year
- ☆92Updated 11 months ago
- ☆43Updated 4 years ago
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆32Updated last week
- Mille Crepe Bench: layer-wise performance analysis for deep learning frameworks.☆17Updated 5 years ago
- ☆26Updated this week
- ☆14Updated last year
- HeteroCL-MLIR dialect for accelerator design☆40Updated 6 months ago
- ☆17Updated 4 years ago
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated last year
- ☆22Updated 2 years ago
- GEMM and Winograd based convolutions using CUTLASS☆26Updated 4 years ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated 2 weeks ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆107Updated 4 months ago
- A Winograd Minimal Filter Implementation in CUDA☆24Updated 3 years ago
- ☆29Updated last year
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆134Updated 2 years ago