brightlaboratory / polydl
☆12Updated 3 years ago
Alternatives and similar repositories for polydl:
Users that are interested in polydl are comparing it to the libraries listed below
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Updated 5 years ago
- ☆14Updated 3 years ago
- ☆13Updated last year
- A novel spatial accelerator for horizontal diffusion weather stencil computation, as described in ICS 2023 paper by Singh et al. (https:/…☆19Updated last year
- GEMM and Winograd based convolutions using CUTLASS☆26Updated 4 years ago
- ColTraIn HBFP Training Emulator☆16Updated 2 years ago
- ☆22Updated 2 years ago
- Multi-target compiler for Sum-Product Networks, based on MLIR and LLVM.☆23Updated 4 months ago
- [ICML 2021] "Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators" by Yonggan Fu, Yonga…☆15Updated 3 years ago
- Memory Optimizations for Deep Learning (ICML 2023)☆64Updated last year
- Adaptive floating-point based numerical format for resilient deep learning☆14Updated 3 years ago
- ☆26Updated last year
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆18Updated 2 years ago
- ☆17Updated 3 years ago
- Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation☆27Updated 5 years ago
- A simulation framework for modeling efficiency of Graph Neural Network Dataflows☆22Updated 2 months ago
- ☆21Updated 2 months ago
- Benchmark PyTorch Custom Operators☆14Updated last year
- ☆13Updated 3 years ago
- Code base for OOPSLA'24 paper: UniSparse: An Intermediate Language for General Sparse Format Customization☆30Updated 5 months ago
- ☆11Updated last year
- Cavs: An Efficient Runtime System for Dynamic Neural Networks☆14Updated 4 years ago
- Sparsity support for PyTorch☆34Updated last month
- An Attention Superoptimizer☆21Updated 3 months ago
- A curated list for Efficient Large Language Models☆11Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆82Updated this week
- Sparse kernels for GNNs based on TVM☆16Updated 4 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆25Updated 2 months ago
- ☆12Updated 2 years ago
- Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs☆25Updated 10 months ago