fsword73/HIP-Performance-Optmization-on-VEGA64

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/fsword73/HIP-Performance-Optmization-on-VEGA64)

fsword73 / HIP-Performance-Optmization-on-VEGA64

14 basic topics for VEGA64 performance optmization

☆66

Alternatives and similar repositories for HIP-Performance-Optmization-on-VEGA64

Users that are interested in HIP-Performance-Optmization-on-VEGA64 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

aditya4d / gemm-vega64
View on GitHub
Implement asm gemm on vega64 for 4096x4096 fp32 matrix
☆22Oct 12, 2019Updated 6 years ago
XiuYuLi / deepcore_source_code
View on GitHub
Subpart source code of of deepcore v0.7
☆27Jun 28, 2020Updated 6 years ago
XiuYuLi / flexible-gemm
View on GitHub
flexible-gemm conv of deepcore
☆17Dec 2, 2019Updated 6 years ago
carlushuang / gcnasm
View on GitHub
amdgpu example code in hip/asm
☆66Updated this week
openmlir / mlir-tutorial
View on GitHub
Hands-On Practical MLIR Tutorial
☆60Aug 21, 2025Updated 11 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
dorsal-lab / hip-analyzer
View on GitHub
Compiler plugin for performance analysis of HIP applications
☆14Jul 1, 2026Updated 3 weeks ago
hyln9 / GCNGEMM
View on GitHub
Optimized half precision gemm assembly kernels (deprecated due to ROCm)
☆47Jun 16, 2017Updated 9 years ago
regehr / pldi22-llvm-tutorial
View on GitHub
outline and links for PLDI 2022 tutorial
☆17Jun 13, 2022Updated 4 years ago
yzhaiustc / Optimizing-SGEMV-on-NVIDIA-GPUs
View on GitHub
An implementation of SGEMV with performance comparable to cuBLAS.
☆12May 21, 2021Updated 5 years ago
sjfeng1999 / gpu-arch-microbenchmark
View on GitHub
Dissecting NVIDIA GPU Architecture
☆126Jul 11, 2022Updated 4 years ago
ROCm / composable_kernel
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror
☆540Updated this week
hpdps-group / hipSZ
View on GitHub
A portable implementation of SZ lossy compression for AMD GPUs and Hygon DCUs.
☆11Feb 26, 2025Updated last year
harrism / cuda_event_benchmark
View on GitHub
Unit benchmarks of CUDA event APIs.
☆17Apr 23, 2024Updated 2 years ago
plaidml / onnx-plaidml
View on GitHub
An ONNX backend using PlaidML
☆28Jun 8, 2018Updated 8 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
carlushuang / cpu_gemm_opt
View on GitHub
how to design cpu gemm on x86 with avx256, that can beat openblas.
☆75Apr 15, 2019Updated 7 years ago
chemeng / GPGPU-GMRES-Method
View on GitHub
CUDA GPU implementation of GMRES iterative Solver
☆10Apr 16, 2012Updated 14 years ago
Jokeren / GPA
View on GitHub
GPU Performance Advisor
☆66Jul 25, 2022Updated 4 years ago
ivanradanov / rodinia
View on GitHub
Rodinia benchmark
☆24Jul 5, 2024Updated 2 years ago
rainerzufalldererste / hypersonic-rANS
View on GitHub
Some of the fastest decoding range-based Asymetric Numeral Systems (rANS) codecs for x64
☆20Sep 3, 2024Updated last year
anlongfei / compilerbook
View on GitHub
compilerbook
☆53Apr 25, 2021Updated 5 years ago
llvm-gpu-news / llvm-gpu-news.github.io
View on GitHub
☆15Jan 21, 2023Updated 3 years ago
JohndeVostok / APE
View on GitHub
A GPU FP32 computation method with Tensor Cores.
☆27Dec 8, 2025Updated 7 months ago
ROCm / rocHPCG
View on GitHub
HPCG benchmark based on ROCm platform
☆41Updated this week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
JieRen98 / SGEMM-SASS-Annotation
View on GitHub
☆21Mar 22, 2021Updated 5 years ago
RIKEN-RCCS / hpl-ai
View on GitHub
An HPL-AI implementation for Fugaku
☆24Jun 29, 2021Updated 5 years ago
ROCm / TransformerEngine
View on GitHub
☆72Updated this week
ROCm / rocSPARSE
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆135Jul 9, 2026Updated 2 weeks ago
ROCm / hipamd
View on GitHub
☆34Jan 25, 2024Updated 2 years ago
hpcgame / hpcgame-platform-0th
View on GitHub
HPC Game Platform
☆11Apr 20, 2023Updated 3 years ago
riktw / SoftcoreComparisons
View on GitHub
The code for an FPGA softcore comparison
☆11Jun 21, 2020Updated 6 years ago
NervanaSystems / maxas
View on GitHub
Assembler for NVIDIA Maxwell architecture
☆1,074Jan 3, 2023Updated 3 years ago
vortexgpgpu / NVPTX-SPIRV-Translator
View on GitHub
The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.
☆45Oct 25, 2021Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
jzfengziyan / shufflenetv2_hls
View on GitHub
A FPGA-based Accelerator for Shufflenetv2 implemented on Xillinx Zynq-7000 SoC
☆16Apr 22, 2019Updated 7 years ago
intel / vc-intrinsics
View on GitHub
☆59Updated this week
ariasanovsky / ptx-parser
View on GitHub
☆11Jun 9, 2023Updated 3 years ago
suraj-srinivas / Huffman-encoder
View on GitHub
Huffman encoder
☆10Sep 8, 2013Updated 12 years ago
ProjectPhysX / PTXprofiler
View on GitHub
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
☆59Mar 20, 2025Updated last year
wzc810049078 / SRT-4-DIVISION
View on GitHub
RADIX-4 SRT division
☆12Oct 31, 2019Updated 6 years ago
nauful / NLZM
View on GitHub
Dictionary compressor with nibbled ANS and optimal parsing. Other compression experiments.
☆25Apr 13, 2025Updated last year