Artifacts of EVT ASPLOS'24
☆29Mar 6, 2024Updated 2 years ago
Alternatives and similar repositories for EVT_AE
Users that are interested in EVT_AE are comparing it to the libraries listed below
Sorting:
- ☆10May 12, 2022Updated 3 years ago
- ☆116May 16, 2025Updated 9 months ago
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Aug 4, 2022Updated 3 years ago
- ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch☆39Mar 27, 2025Updated 11 months ago
- ☆28Feb 26, 2023Updated 3 years ago
- Cute layout visualization☆30Jan 18, 2026Updated last month
- ☆24May 9, 2025Updated 9 months ago
- ☆11Jun 29, 2021Updated 4 years ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- ☆88May 31, 2025Updated 9 months ago
- GPU Performance Advisor☆66Jul 25, 2022Updated 3 years ago
- A synthesis flow for hybrid processing-in-RRAM modes☆12Jul 15, 2021Updated 4 years ago
- FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Data on GPUs☆14Sep 26, 2023Updated 2 years ago
- A tiny FP8 multiplication unit written in Verilog. TinyTapeout 2 submission.☆14Nov 23, 2022Updated 3 years ago
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels☆32Mar 15, 2021Updated 4 years ago
- ☆262Jul 11, 2024Updated last year
- ☆118May 19, 2025Updated 9 months ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Nov 23, 2024Updated last year
- ☆14Mar 4, 2015Updated 11 years ago
- HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration☆15Sep 14, 2020Updated 5 years ago
- ☆178May 7, 2025Updated 9 months ago
- SMT-LIB benchmarks for shape computations from deep learning models in PyTorch☆18Dec 21, 2022Updated 3 years ago
- ☆14Jan 24, 2023Updated 3 years ago
- Run OpenCL program on MOBILE GPU (Qualcomm & ARM) !☆18Jun 27, 2018Updated 7 years ago
- Shared Middle-Layer for Triton Compilation☆329Dec 5, 2025Updated 3 months ago
- ☆49Apr 15, 2024Updated last year
- A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.☆71Feb 18, 2026Updated 2 weeks ago
- ☆18Aug 9, 2025Updated 6 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Feb 24, 2026Updated last week
- Artifact for OSDI'21 GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs.☆69Mar 2, 2023Updated 3 years ago
- Artifact for PPoPP20 "Understanding and Bridging the Gaps in Current GNN Performance Optimizations"☆41Nov 16, 2021Updated 4 years ago
- ☆46Jun 18, 2024Updated last year
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated last year
- Fast and efficient attention method exploration and implementation.☆25Mar 25, 2025Updated 11 months ago
- TPP experimentation on MLIR for linear algebra☆146Feb 24, 2026Updated last week
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- ☆17Jan 24, 2024Updated 2 years ago