aditya4d/gemm-vega64

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/aditya4d/gemm-vega64)

aditya4d / gemm-vega64

Implement asm gemm on vega64 for 4096x4096 fp32 matrix

☆22

Alternatives and similar repositories for gemm-vega64

Users that are interested in gemm-vega64 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

fsword73 / HIP-Performance-Optmization-on-VEGA64
View on GitHub
14 basic topics for VEGA64 performance optmization
☆66Mar 18, 2021Updated 5 years ago
hyln9 / GCNGEMM
View on GitHub
Optimized half precision gemm assembly kernels (deprecated due to ROCm)
☆47Jun 16, 2017Updated 9 years ago
Stefan20162016 / maxas-explained
View on GitHub
maxas Scott Grey's maxas assembler sgemm explaining the (for me) missing parts https://github.com/NervanaSystems/maxas
☆17Dec 22, 2018Updated 7 years ago
Zehaos / pycaffe-yolo
View on GitHub
YOLO reimplement in caffe, written with python layer.
☆14Apr 11, 2017Updated 9 years ago
PAA-NCIC / PPoPP2017_artifact
View on GitHub
Third party assembler and GEMM library for NVIDIA Kepler GPU
☆86Oct 8, 2019Updated 6 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
XiuYuLi / deepcore_source_code
View on GitHub
Subpart source code of of deepcore v0.7
☆27Jun 28, 2020Updated 6 years ago
yzhaiustc / Optimizing-SGEMV-on-NVIDIA-GPUs
View on GitHub
An implementation of SGEMV with performance comparable to cuBLAS.
☆12May 21, 2021Updated 5 years ago
RIKEN-RCCS / hpl-ai
View on GitHub
An HPL-AI implementation for Fugaku
☆24Jun 29, 2021Updated 5 years ago
hpdps-group / hipSZ
View on GitHub
A portable implementation of SZ lossy compression for AMD GPUs and Hygon DCUs.
☆11Feb 26, 2025Updated last year
carlushuang / gcnasm
View on GitHub
amdgpu example code in hip/asm
☆66Updated this week
weifengliu-ssslab / Benchmark_SpMV_using_CSR
View on GitHub
CSR-based SpMV on Heterogeneous Processors (Intel Broadwell, AMD Kaveri and nVidia Tegra K1)
☆26May 12, 2015Updated 11 years ago
chemeng / GPGPU-GMRES-Method
View on GitHub
CUDA GPU implementation of GMRES iterative Solver
☆10Apr 16, 2012Updated 14 years ago
rainerzufalldererste / hypersonic-rANS
View on GitHub
Some of the fastest decoding range-based Asymetric Numeral Systems (rANS) codecs for x64
☆20Sep 3, 2024Updated last year
md2z34 / winograd_gpu
View on GitHub
GPU implementation of Winograd convolution
☆10Oct 23, 2017Updated 8 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
h8liu / nachos
View on GitHub
CSE120 Project
☆10Nov 19, 2014Updated 11 years ago
JieRen98 / SGEMM-SASS-Annotation
View on GitHub
☆21Mar 22, 2021Updated 5 years ago
quettabit / convolution_kernel
View on GitHub
Accelerating CNN's convolution operation on GPUs by using memory-efficient data access patterns.
☆14Dec 8, 2017Updated 8 years ago
riktw / SoftcoreComparisons
View on GitHub
The code for an FPGA softcore comparison
☆11Jun 21, 2020Updated 6 years ago
bryancatanzaro / inplace
View on GitHub
CUDA and OpenMP implementations of C2R/R2C inplace transposition
☆49Feb 10, 2015Updated 11 years ago
NaoyukiIchimura / cuda_image_filtering_global
View on GitHub
☆11Dec 5, 2018Updated 7 years ago
PAA-NCIC / GSWITCH
View on GitHub
A pattern-based algorithmic autotuner for graph processing on GPUs.
☆33Jun 25, 2025Updated last year
wzc810049078 / SRT-4-DIVISION
View on GitHub
RADIX-4 SRT division
☆12Oct 31, 2019Updated 6 years ago
nauful / NLZM
View on GitHub
Dictionary compressor with nibbled ANS and optimal parsing. Other compression experiments.
☆25Apr 13, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
a2k-hanlon / linter-veriloghdl
View on GitHub
Atom linter for Verilog/SystemVerilog, using Icarus Verilog, Slang, Verible or Verilator.
☆10Jul 12, 2023Updated 3 years ago
eth-cscs / spla
View on GitHub
Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…
☆32Jun 26, 2024Updated 2 years ago
masahi / tvm-winograd
View on GitHub
Test winograd convolution written in TVM for CUDA and AMDGPU
☆41Oct 12, 2018Updated 7 years ago
pecostm32 / Anlogic_AL3-10_Analyzing
View on GitHub
An attempt to reverse engineer a bitstream made for an AL3-10 FPGA
☆16Jan 6, 2023Updated 3 years ago
bohanw / jpeg_comp_verilog
View on GitHub
JPEG Compression RTL implementation
☆11Aug 19, 2017Updated 8 years ago
qdu1995 / DQSD
View on GitHub
☆11Jun 27, 2021Updated 5 years ago
patflick / miopen-benchmark
View on GitHub
benchmarking miopen
☆17Jan 14, 2019Updated 7 years ago
har-in-air / SIPEED_TANG_PRIMER
View on GitHub
Projects using the Sipeed Tang Primer FPGA development board
☆17Dec 6, 2020Updated 5 years ago
BG2BKK / my_benchmark
View on GitHub
benchmark for linux server
☆13Nov 6, 2016Updated 9 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
wmy367 / Radix-2-division
View on GitHub
unsigned Radix-2 SRT division,基2除法
☆17May 12, 2015Updated 11 years ago
jg-fossh / uvm-python-verification-lib
View on GitHub
UVM Python Verification Agents Library
☆15Mar 18, 2021Updated 5 years ago
YusukeNagasaka / Batched-SpMM
View on GitHub
New batched algorithm for sparse matrix-matrix multiplication (SpMM)
☆16May 7, 2019Updated 7 years ago
AsFigo / ivl_uvm
View on GitHub
Adding UVM support to Icarus Verilog (and Verilator in near future) by taking a step-by-step, bottom-up approach.
☆24Dec 27, 2022Updated 3 years ago
yongye / c
View on GitHub
Tetris Game // Generalized Tetris in C
☆11Aug 15, 2017Updated 8 years ago
Risto97 / systemc_uvm_verilator
View on GitHub
☆13Aug 22, 2022Updated 3 years ago
ROCm / rocWMMA
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆140Updated this week