apuaaChen / EVT_AEView external linksLinks
Artifacts of EVT ASPLOS'24
☆29Mar 6, 2024Updated last year
Alternatives and similar repositories for EVT_AE
Users that are interested in EVT_AE are comparing it to the libraries listed below
Sorting:
- ☆10May 12, 2022Updated 3 years ago
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Aug 4, 2022Updated 3 years ago
- ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch☆39Mar 27, 2025Updated 10 months ago
- ☆28Feb 26, 2023Updated 2 years ago
- Cute layout visualization☆30Jan 18, 2026Updated 3 weeks ago
- ☆24May 9, 2025Updated 9 months ago
- ☆11Jun 29, 2021Updated 4 years ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- ☆88May 31, 2025Updated 8 months ago
- GPU Performance Advisor☆65Jul 25, 2022Updated 3 years ago
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- A tiny FP8 multiplication unit written in Verilog. TinyTapeout 2 submission.☆14Nov 23, 2022Updated 3 years ago
- A synthesis flow for hybrid processing-in-RRAM modes☆12Jul 15, 2021Updated 4 years ago
- FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Data on GPUs☆14Sep 26, 2023Updated 2 years ago
- ☆13Mar 4, 2015Updated 10 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels☆32Mar 15, 2021Updated 4 years ago
- ☆118May 19, 2025Updated 8 months ago
- ☆261Jul 11, 2024Updated last year
- HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration☆15Sep 14, 2020Updated 5 years ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Nov 23, 2024Updated last year
- ☆175May 7, 2025Updated 9 months ago
- parser script to process pytorch autograd profiler result, convert json file to excel.☆14Oct 8, 2019Updated 6 years ago
- ☆14Jan 24, 2023Updated 3 years ago
- Run OpenCL program on MOBILE GPU (Qualcomm & ARM) !☆19Jun 27, 2018Updated 7 years ago
- SMT-LIB benchmarks for shape computations from deep learning models in PyTorch☆18Dec 21, 2022Updated 3 years ago
- Shared Middle-Layer for Triton Compilation☆326Dec 5, 2025Updated 2 months ago
- Artifact for PPoPP20 "Understanding and Bridging the Gaps in Current GNN Performance Optimizations"☆40Nov 16, 2021Updated 4 years ago
- ☆49Apr 15, 2024Updated last year
- A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.☆66Updated this week
- ☆17Aug 9, 2025Updated 6 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Updated this week
- Artifact for OSDI'21 GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs.☆70Mar 2, 2023Updated 2 years ago
- ☆46Jun 18, 2024Updated last year
- TPP experimentation on MLIR for linear algebra☆144Feb 2, 2026Updated last week
- Fast and efficient attention method exploration and implementation.☆25Mar 25, 2025Updated 10 months ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated last year
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- ☆17Jan 24, 2024Updated 2 years ago