carlushuang/gcnasm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/carlushuang/gcnasm)

carlushuang / gcnasm

amdgpu example code in hip/asm

☆66

Alternatives and similar repositories for gcnasm

Users that are interested in gcnasm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ROCm / FlyDSL
View on GitHub
FlyDSL is the Python front‑end of the project: a Flexible Layout Python DSL for expressing tiling, partitioning, data movement, and kerne…
☆252Updated this week
ROCm / aiter
View on GitHub
AI Tensor Engine for ROCm
☆507Updated this week
HazyResearch / HipKittens
View on GitHub
Fast and Furious AMD Kernels
☆447Updated this week
ROCm / amd_matrix_instruction_calculator
View on GitHub
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
☆140Apr 10, 2026Updated 3 months ago
ROCm / ATOM
View on GitHub
AiTer Optimized Model
☆145Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ROCm / composable_kernel
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror
☆540Updated this week
ROCm / mori
View on GitHub
Modular RDMA Interface
☆162Updated this week
ROCm / gfx950-gluon-tutorials
View on GitHub
A practical guide to high-performance gluon kernel development on AMD GFX9 GPUs.
☆41Updated this week
ROCm / iris
View on GitHub
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
☆193Updated this week
ROCm / rocprof-compute-viewer
View on GitHub
☆62Jul 16, 2026Updated last week
seb-v / fp32_sgemm_amd
View on GitHub
Super fast FP32 matrix multiplication on RDNA3
☆92Mar 30, 2025Updated last year
AMD-AGI / Primus
View on GitHub
A flexible and high-performance training framework designed for large-scale foundation model training on AMD GPUs
☆109Updated this week
mk1-project / quickreduce
View on GitHub
QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.
☆38Aug 29, 2025Updated 10 months ago
RadeonFlow / RadeonFlow_Kernels
View on GitHub
Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X
☆79Feb 11, 2026Updated 5 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
aditya4d / gemm-vega64
View on GitHub
Implement asm gemm on vega64 for 4096x4096 fp32 matrix
☆22Oct 12, 2019Updated 6 years ago
ROCm / rocSHMEM
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆146Jul 20, 2026Updated last week
AMD-AGI / Primus-Turbo
View on GitHub
A high-performance acceleration library dedicated to large-scale model training on AMD GPUs
☆69Updated this week
ROCm / MISA
View on GitHub
Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)
☆36Jul 30, 2025Updated 11 months ago
ROCm / tritonBLAS
View on GitHub
A lightweight triton-based General Matrix Multiplication (GEMM) library.
☆66Jul 21, 2026Updated last week
iree-org / aster
View on GitHub
ASTER 💫 : Assembly Tooling and Representations
☆34Jul 1, 2026Updated 3 weeks ago
sammysun0711 / ov_llm_bench
View on GitHub
OpenVINO LLM Benchmark
☆11Dec 7, 2023Updated 2 years ago
luongthecong123 / fp8-quant-matmul
View on GitHub
Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.
☆19Feb 9, 2026Updated 5 months ago
ROCm / rocm-examples
View on GitHub
A collection of examples for the ROCm software stack
☆307Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Snektron / exaregex
View on GitHub
Zig regex experiment
☆13Nov 6, 2025Updated 8 months ago
ROCm / rocmProfileData
View on GitHub
☆30Updated this week
fsword73 / HIP-Performance-Optmization-on-VEGA64
View on GitHub
14 basic topics for VEGA64 performance optmization
☆66Mar 18, 2021Updated 5 years ago
ROCm / hipFile
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆23Updated this week
kennethdsheridan / rocm_gpu_tradecraft
View on GitHub
Commands that will make you more comfortable with the ROCm toolkit.
☆18Aug 1, 2024Updated last year
owensgroup / SlabAlloc
View on GitHub
A dynamic GPU memory allocator, suitable for warp synchronized scenarios.
☆11Aug 20, 2019Updated 6 years ago
ROCm / rocprof-trace-decoder
View on GitHub
☆17Apr 10, 2026Updated 3 months ago
amd / amd-lab-notes
View on GitHub
AMD lab notes with code examples to demonstrate use of AMD GPUs
☆116Jun 28, 2024Updated 2 years ago
ROCm / rocprofiler-compute
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-systems repo
☆165May 28, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
JieRen98 / SGEMM-SASS-Annotation
View on GitHub
☆21Mar 22, 2021Updated 5 years ago
KernelTuner / kernel_float
View on GitHub
CUDA/HIP header-only library for low-precision (16 bit, 8 bit) and vectorized GPU kernel development
☆24Updated this week
yifuwang / symm-mem-recipes
View on GitHub
☆170Dec 27, 2024Updated last year
kabir2505 / tiny-mixtral
View on GitHub
☆44May 4, 2025Updated last year
ROCm / hipBLASLt
View on GitHub
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆114Updated this week
Adaptyst / Adaptyst
View on GitHub
A comprehensive and architecture-agnostic performance analysis tool.
☆13Jul 1, 2026Updated 3 weeks ago
vortexgpgpu / NVPTX-SPIRV-Translator
View on GitHub
The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.
☆45Oct 25, 2021Updated 4 years ago