corsix / amxLinks
Apple AMX Instruction Set
☆1,178Updated last year
Alternatives and similar repositories for amx
Users that are interested in amx are comparing it to the libraries listed below
Sorting:
- Apple G13 GPU architecture docs and tools☆633Updated 7 months ago
- Apple GPU microarchitecture☆568Updated last year
- Exploring the scalable matrix extension of the Apple M4 processor☆213Updated last year
- Apple Firestorm/Icestorm CPU microarchitecture docs☆248Updated 2 years ago
- Reverse engineered Linux driver for the Apple Neural Engine (ANE).☆443Updated last year
- Nvidia Instruction Set Specification Generator☆306Updated last year
- Dissecting the M1's GPU for 3D acceleration☆1,015Updated 3 years ago
- ☆450Updated 8 months ago
- ☆1,073Updated 7 months ago
- Sniff CUDA ioctls☆219Updated 2 years ago
- Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.☆739Updated last week
- GPUOcelot: A dynamic compilation framework for PTX☆219Updated 10 months ago
- Exocompilation for productive programming of hardware accelerators☆697Updated last week
- C++ template library for high performance SIMD based sorting algorithms☆989Updated 3 months ago
- MLIR For Beginners tutorial☆1,178Updated 5 months ago
- A new (MLIR based) high-level IR for clang.☆566Updated last week
- CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-base…☆649Updated last week
- ☆296Updated last year
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- The fastest RISC-V sandbox☆987Updated last month
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆564Updated 2 years ago
- C/C++ frontend for MLIR. Also features polyhedral optimizations, parallel optimizations, and more!☆595Updated 6 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆371Updated 8 months ago
- Measures the latency between CPU cores☆1,302Updated last year
- Backward compatible ML compute opset inspired by HLO/MHLO☆584Updated last week
- throwaway GPT inference☆141Updated last year
- FlashAttention (Metal Port)☆569Updated last year
- Contains the source code examples described in the "Intel® 64 and IA-32 Architectures Optimization Reference Manual"☆805Updated last year
- BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.☆36Updated 2 years ago
- Optimized implementations of various library functions for ARM architecture processors☆676Updated 2 weeks ago