nicolaswilde/amx-gemm-handwritten

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/nicolaswilde/amx-gemm-handwritten)

nicolaswilde / amx-gemm-handwritten

Handwritten GEMM using Intel AMX (Advanced Matrix Extension)

☆17

Alternatives and similar repositories for amx-gemm-handwritten

Users that are interested in amx-gemm-handwritten are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
Oneflow-Inc / dfccl
View on GitHub
☆26Feb 17, 2025Updated last year
KuangjuX / Paper-reading
View on GitHub
My Paper Reading Lists and Notes.
☆25May 8, 2026Updated 2 months ago
flashinfer-ai / cubloaty
View on GitHub
a size profiler for cuda binary
☆71Jan 15, 2026Updated 6 months ago
HanGuo97 / hilt
View on GitHub
☆40Dec 14, 2025Updated 7 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
howardlau1999 / rdmapp
View on GitHub
C++ interfaces for RDMA access
☆84Jul 13, 2026Updated last week
icsnju / VeriXmith
View on GitHub
A tool for cross-checking Verilog compilers
☆15Apr 16, 2025Updated last year
irvingzhang0512 / tvm_tests
View on GitHub
☆17Sep 2, 2020Updated 5 years ago
sgl-project / DeepGEMM
View on GitHub
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
☆32Updated this week
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
Edenzzzz / claude-history-sync
View on GitHub
Synchronizing Claude Code conversations across machines
☆16Jul 3, 2026Updated 2 weeks ago
Chtholly-Boss / swizzle
View on GitHub
A practical way of learning Swizzle
☆42Feb 3, 2025Updated last year
Infrawaves / DeepEP_ibrc_dual-ports_multiQP
View on GitHub
Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport
☆75May 9, 2025Updated last year
YJMSTR / flash-linear-attention
View on GitHub
FLA but cuTile
☆27Apr 17, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
xlite-dev / netron-vscode-extension
View on GitHub
☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.
☆14Jun 4, 2023Updated 3 years ago
SkyworkAI / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆17Jun 3, 2024Updated 2 years ago
li-plus / flash-preference
View on GitHub
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆52Jul 4, 2025Updated last year
VivekPanyam / cudaparsers
View on GitHub
Parsers for CUDA binary files
☆25Dec 29, 2023Updated 2 years ago
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆86May 5, 2025Updated last year
flashinfer-ai / debug-print
View on GitHub
Debug print operator for cudagraph debugging
☆18Aug 2, 2024Updated last year
NGIOproject / PMTutorial
View on GitHub
Slides and exercises for persistent memory programming tutorial
☆14Nov 14, 2022Updated 3 years ago
weishengying / cutlass_flash_atten_fp8
View on GitHub
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
☆82Aug 12, 2024Updated last year
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
dlsyscourse / hw1
View on GitHub
☆15Sep 25, 2025Updated 9 months ago
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated last year
Triang-jyed-driung / i8muon
View on GitHub
Muon in Int8 Precision Made Possible
☆20Jun 18, 2026Updated last month
bytedance / QSync
View on GitHub
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Feb 23, 2024Updated 2 years ago
totsugekitai / tvisor-kmod
View on GitHub
Hypervisor from scratch in linux
☆13May 8, 2022Updated 4 years ago
shzhxh / v9-doc
View on GitHub
CPU + xv6 + compiler
☆18Feb 7, 2018Updated 8 years ago
Bruce-Lee-LY / cutlass_gemm
View on GitHub
Multiple GEMM operators are constructed with cutlass to support LLM inference.
☆20Aug 3, 2025Updated 11 months ago
hw-native-sys / simpler
View on GitHub
☆29Updated this week
sgl-project / whl
View on GitHub
SGLang Kernel Wheel Index
☆24Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
oscomp / proj275-eBPF-for-AI-ML-tracking-and-performance-analysis
View on GitHub
☆12May 13, 2025Updated last year
chengzeyi / piflux
View on GitHub
(WIP) Parallel inference for black-forest-labs' FLUX model.
☆19Nov 18, 2024Updated last year
WukLab / Mira
View on GitHub
A Progam-Behavior-Guided Far Memory System
☆36Oct 26, 2023Updated 2 years ago
jetafese / btor2mlir
View on GitHub
Bᴛᴏʀ2MLIR: A Format and Toolchain for Hardware Verification
☆20Jul 8, 2026Updated last week
nfcim / fido2
View on GitHub
Dart library to parse FIDO2 request / response and interactive with FIDO2 (CTAP) authenticators.
☆14Sep 9, 2025Updated 10 months ago
lipracer / cuda-rt-hook
View on GitHub
☆46Jul 16, 2025Updated last year
nicexlab / GeminiFS
View on GitHub
GeminiFS: A Companion File System for GPUs
☆84Jul 8, 2026Updated last week