Stefan20162016 / maxas-explained
maxas Scott Grey's maxas assembler sgemm explaining the (for me) missing parts https://github.com/NervanaSystems/maxas
☆13Updated 5 years ago
Related projects: ⓘ
- ☆48Updated 4 years ago
- Implement asm gemm on vega64 for 4096x4096 fp32 matrix☆19Updated 4 years ago
- assembler for NVIDIA FERMI. Imported from Google Code☆68Updated 9 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆76Updated 4 years ago
- ☆44Updated 5 years ago
- MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com☆38Updated 9 months ago
- A framework that helps implementing swizzle GPU kernels☆38Updated 4 years ago
- Source for Demystifying GPU Microarchitecture through Microbenchmarking☆16Updated last year
- Enabling on-the-fly manipulations with LLVM IR code of CUDA sources☆97Updated last year
- ☆39Updated 3 years ago
- Polyhedral Parallel Code Generation (source repository: http://repo.or.cz/ppcg.git)☆116Updated 2 years ago
- Bridging polyhedral analysis tools to the MLIR framework☆99Updated last year
- Kernel Fusion and Runtime Compilation Based on NNVM☆69Updated 7 years ago
- Emulating DMA Engines on GPUs for Performance and Portability☆32Updated 9 years ago
- A GPU cache model for research purposes☆26Updated 10 years ago
- The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github…☆32Updated 2 months ago
- GPU Performance Advisor☆58Updated 2 years ago
- Polyhedral Extraction Tool (source repository: http://repo.or.cz/w/pet.git)☆36Updated 2 years ago
- Flexible GPGPU instrumentation☆85Updated 4 years ago
- Performance Prediction Toolkit☆51Updated 2 years ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆27Updated last year
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆107Updated last year
- HCC Sample Applications☆13Updated 7 years ago
- CUDAAdvisor: a GPU profiling tool☆48Updated 6 years ago
- Integer Set Library (source repository: http://repo.or.cz/w/isl.git)☆63Updated last year
- Decuda and cudasm, the CUDA binary utilities package. Low-level tools for NVidia G80 GPUs.☆94Updated 14 years ago
- Chunky Loop Interaction☆23Updated 5 years ago
- Haystack is an analytical cache model that given a program computes the number of cache misses.☆42Updated 5 years ago
- Chai☆41Updated 9 months ago
- A simple tool to profile performance of multiple combinations of GEMM of cuBLAS☆24Updated 3 years ago