corsix / amxLinks

Apple AMX Instruction Set

☆1,121

Alternatives and similar repositories for amx

Users that are interested in amx are comparing it to the libraries listed below

Sorting:

dougallj / applegpu
Apple G13 GPU architecture docs and tools
☆597Updated 2 months ago
philipturner / metal-benchmarks
Apple GPU microarchitecture
☆533Updated 10 months ago
tzakharko / m4-sme-exploration
Exploring the scalable matrix extension of the Apple M4 processor
☆191Updated 8 months ago
dougallj / applecpu
Apple Firestorm/Icestorm CPU microarchitecture docs
☆242Updated 2 years ago
eiln / ane
Reverse engineered Linux driver for the Apple Neural Engine (ANE).
☆416Updated last year
name99-org / AArch64-Explore
☆289Updated 7 months ago
hollance / neural-engine
Everything we actually know about the Apple Neural Engine (ANE)
☆2,242Updated 4 months ago
kuterd / nv_isa_solver
Nvidia Instruction Set Specification Generator
☆285Updated last year
AsahiLinux / gpu
Dissecting the M1's GPU for 3D acceleration
☆1,007Updated 3 years ago
tinygrad / 7900xtx
☆449Updated 3 months ago
mikex86 / LibreCuda
☆1,044Updated 2 months ago
exo-lang / exo
Exocompilation for productive programming of hardware accelerators
☆649Updated last week
philipturner / amx-benchmarks
Running linear algebra as fast as possible on Apple silicon
☆21Updated last year
gpuocelot / gpuocelot
GPUOcelot: A dynamic compilation framework for PTX
☆204Updated 5 months ago
google / ml-compiler-opt
Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.
☆709Updated this week
j2kun / mlir-tutorial
MLIR For Beginners tutorial
☆1,029Updated 2 weeks ago
geohot / cuda_ioctl_sniffer
Sniff CUDA ioctls
☆205Updated 2 years ago
llvm / clangir
A new (MLIR based) high-level IR for clang.
☆516Updated this week
nviennot / core-to-core-latency
Measures the latency between CPU cores
☆1,242Updated 11 months ago
abeleinin / Metal-Puzzles
Solve Puzzles. Learn Metal 🤘
☆574Updated 10 months ago
llvm / circt
Circuit IR Compilers and Tools
☆1,868Updated this week
llvm / Polygeist
C/C++ frontend for MLIR. Also features polyhedral optimizations, parallel optimizations, and more!
☆566Updated last month
xrq-phys / blis_apple
BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.
☆35Updated 2 years ago
google / nsync
nsync is a C library that exports various synchronization primitives, such as mutexes
☆1,183Updated 3 months ago
apple-oss-distributions / xnu
☆2,426Updated 2 months ago
libriscv / libriscv
The fastest RISC-V sandbox
☆901Updated this week
Veedrac / microarchitecturometer
Measures microarchitectural details such as ROB size. Like https://github.com/travisdowns/robsize but without runtime code generation, wh…
☆129Updated 4 years ago
ARM-software / optimized-routines
Optimized implementations of various library functions for ARM architecture processors
☆642Updated last week
cloudcores / CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆523Updated 2 years ago
tenstorrent / tt-metal
TT-NN operator library, and TT-Metalium low level kernel programming model.
☆1,022Updated this week