Implement asm gemm on vega64 for 4096x4096 fp32 matrix
☆22Oct 12, 2019Updated 6 years ago
Alternatives and similar repositories for gemm-vega64
Users that are interested in gemm-vega64 are comparing it to the libraries listed below
Sorting:
- 14 basic topics for VEGA64 performance optmization☆64Mar 18, 2021Updated 4 years ago
- maxas Scott Grey's maxas assembler sgemm explaining the (for me) missing parts https://github.com/NervanaSystems/maxas☆17Dec 22, 2018Updated 7 years ago
- flexible-gemm conv of deepcore☆17Dec 2, 2019Updated 6 years ago
- An HPL-AI implementation for Fugaku☆23Jun 29, 2021Updated 4 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆84Oct 8, 2019Updated 6 years ago
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Jun 16, 2017Updated 8 years ago
- Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…☆31Jun 26, 2024Updated last year
- CSR-based SpMV on Heterogeneous Processors (Intel Broadwell, AMD Kaveri and nVidia Tegra K1)☆26May 12, 2015Updated 10 years ago
- Subpart source code of of deepcore v0.7☆27Jun 28, 2020Updated 5 years ago
- A pattern-based algorithmic autotuner for graph processing on GPUs.☆32Jun 25, 2025Updated 8 months ago
- Port of the LLVM compiler infrastructure to the time-predictable processor Patmos☆15Apr 2, 2025Updated 11 months ago
- RISC-V 64 CPU☆10Oct 4, 2025Updated 5 months ago
- ☆11Aug 23, 2023Updated 2 years ago
- Transparent serialization of python plain-old-data classes☆12Aug 31, 2022Updated 3 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆35Jul 28, 2020Updated 5 years ago
- ☆12Feb 15, 2024Updated 2 years ago
- Atom linter for Verilog/SystemVerilog, using Icarus Verilog, Slang, Verible or Verilator.☆10Jul 12, 2023Updated 2 years ago
- amdgpu example code in hip/asm☆56Updated this week
- RADIX-4 SRT division☆12Oct 31, 2019Updated 6 years ago
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆48Feb 10, 2015Updated 11 years ago
- 基于X86架构的简单Cminus语言编译器☆10Apr 1, 2022Updated 3 years ago
- An implementation of memcpy for amd64 with clang/gcc☆15Feb 7, 2022Updated 4 years ago
- A sample kernel module showing the memory reordering.☆11May 30, 2020Updated 5 years ago
- Generate Linux Perf event tables for Apple Silicon☆17Dec 16, 2025Updated 2 months ago
- An instruction of how to modify bios image to enable avx512 for alderlake CPU on modern gigabyte motherboards☆13Feb 2, 2023Updated 3 years ago
- The code for an FPGA softcore comparison☆11Jun 21, 2020Updated 5 years ago
- A Rust library to handle OpenSSH key and other common SSH key☆15Jun 3, 2023Updated 2 years ago
- An MLIR-based AI compiler designed for Python frontend to RISC-V DSA☆13Oct 10, 2024Updated last year
- A tool for cross-checking Verilog compilers☆14Apr 16, 2025Updated 10 months ago
- JPEG Compression RTL implementation☆11Aug 19, 2017Updated 8 years ago
- Dynamic NFTs is a code-pattern & tooling that enables web3 creators to have true ownership of assets and create upgradable NFTs in a trus…☆13Jan 11, 2022Updated 4 years ago
- ☆10Nov 12, 2019Updated 6 years ago
- Compiler plugin for performance analysis of HIP applications☆13Apr 7, 2025Updated 11 months ago
- ☆13Nov 15, 2022Updated 3 years ago
- OCEAN – Open-source CXL Emulation at Hyperscale Architecture and Networking.☆23Feb 25, 2026Updated last week
- Huffman encoder☆10Sep 8, 2013Updated 12 years ago
- An online simulator for finite automata (FA), pushdown automata (PDA) and linear bounded automata (LBA).☆11Oct 30, 2017Updated 8 years ago
- GPU implementation of Winograd convolution☆10Oct 23, 2017Updated 8 years ago
- compatible library for ebpf programs to improve BTF portability☆14Oct 11, 2023Updated 2 years ago