corsix / amxLinks
Apple AMX Instruction Set
☆1,135Updated 8 months ago
Alternatives and similar repositories for amx
Users that are interested in amx are comparing it to the libraries listed below
Sorting:
- Apple G13 GPU architecture docs and tools☆616Updated 3 months ago
- Apple GPU microarchitecture☆548Updated 11 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆199Updated 10 months ago
- Reverse engineered Linux driver for the Apple Neural Engine (ANE).☆422Updated last year
- Apple Firestorm/Icestorm CPU microarchitecture docs☆241Updated 2 years ago
- ☆295Updated 8 months ago
- Nvidia Instruction Set Specification Generator☆293Updated last year
- Dissecting the M1's GPU for 3D acceleration☆1,010Updated 3 years ago
- ☆450Updated 5 months ago
- Sniff CUDA ioctls☆206Updated 2 years ago
- ☆1,053Updated 3 months ago
- MLIR For Beginners tutorial☆1,065Updated last month
- Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.☆719Updated 3 weeks ago
- BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.☆35Updated 2 years ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆356Updated 4 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆206Updated 7 months ago
- Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library☆1,670Updated 4 months ago
- Running linear algebra as fast as possible on Apple silicon☆21Updated 2 years ago
- Exocompilation for productive programming of hardware accelerators☆657Updated this week
- Measures the latency between CPU cores☆1,263Updated last year
- C++ template library for high performance SIMD based sorting algorithms☆962Updated this week
- FlashAttention (Metal Port)☆529Updated 11 months ago
- nsync is a C library that exports various synchronization primitives, such as mutexes☆1,208Updated this week
- Contains the source code examples described in the "Intel® 64 and IA-32 Architectures Optimization Reference Manual"☆802Updated last year
- Implementations of SIMD instruction sets for systems which don't natively support them.☆2,792Updated this week
- A new (MLIR based) high-level IR for clang.☆534Updated this week
- Solve Puzzles. Learn Metal 🤘☆582Updated 11 months ago
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,118Updated this week
- C/C++ frontend for MLIR. Also features polyhedral optimizations, parallel optimizations, and more!☆572Updated 2 months ago
- Circuit IR Compilers and Tools☆1,893Updated last week