illinois-impact / klap
A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches
☆13Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for klap
- An Attention Superoptimizer☆20Updated 6 months ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated 5 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆22Updated 3 weeks ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆17Updated 2 years ago
- ☆13Updated last year
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆27Updated last month
- PTX-EMU is a simple emulator for CUDA program.☆24Updated 10 months ago
- HeteroGen: transpiling C to heterogeneous HLS code with automated test generation and program repair (ASPLOS 2022)☆17Updated last month
- Slides from 2021-12-15 talk, "TVM Developer Bootcamp – Writing Hardware Backends"☆10Updated 2 years ago
- ☆11Updated 3 years ago
- GVProf: A Value Profiler for GPU-based Clusters☆47Updated 7 months ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆19Updated last year
- Cavs: An Efficient Runtime System for Dynamic Neural Networks☆13Updated 4 years ago
- ETHZ Heterogeneous Accelerated Compute Cluster.☆29Updated last month
- ☆8Updated last year
- ☆32Updated 2 years ago
- ☆31Updated last year
- Linux source code for ISCA 2020 paper "Enhancing and Exploiting Contiguity for Fast Memory Virtualization"☆17Updated 4 years ago
- A Top-Down Profiler for GPU Applications☆13Updated 8 months ago
- Memory consistency model checking and test generation library.☆13Updated 8 years ago
- ☆47Updated 5 years ago
- SparseP is the first open-source Sparse Matrix Vector Multiplication (SpMV) software package for real-world Processing-In-Memory (PIM) ar…☆70Updated 2 years ago
- Artifact of ASPLOS'23 paper entitled: GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model Inference☆16Updated last year
- Graphiler is a compiler stack built on top of DGL and TorchScript which compiles GNNs defined using user-defined functions (UDFs) into ef…☆60Updated 2 years ago
- Mille Crepe Bench: layer-wise performance analysis for deep learning frameworks.☆17Updated 5 years ago
- Noisy language compiler☆17Updated 3 months ago
- ☆36Updated this week
- Multi-target compiler for Sum-Product Networks, based on MLIR and LLVM.☆22Updated this week
- ☆21Updated last year