corsix / amxLinks
Apple AMX Instruction Set
☆1,128Updated 7 months ago
Alternatives and similar repositories for amx
Users that are interested in amx are comparing it to the libraries listed below
Sorting:
- Apple G13 GPU architecture docs and tools☆600Updated 3 months ago
- Apple GPU microarchitecture☆547Updated 11 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆194Updated 9 months ago
- Apple Firestorm/Icestorm CPU microarchitecture docs☆241Updated 2 years ago
- Nvidia Instruction Set Specification Generator☆289Updated last year
- ☆449Updated 4 months ago
- ☆292Updated 7 months ago
- Sniff CUDA ioctls☆204Updated 2 years ago
- Everything we actually know about the Apple Neural Engine (ANE)☆2,249Updated 5 months ago
- ☆1,048Updated 3 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆207Updated 6 months ago
- Kernel extension that enables TSO for Apple silicon processes☆264Updated 2 years ago
- Exocompilation for productive programming of hardware accelerators☆654Updated this week
- BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.☆35Updated 2 years ago
- Running linear algebra as fast as possible on Apple silicon☆21Updated 2 years ago
- The fastest RISC-V sandbox☆912Updated last week
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.☆711Updated 3 weeks ago
- MLIR For Beginners tutorial☆1,047Updated last month
- nsync is a C library that exports various synchronization primitives, such as mutexes☆1,198Updated 4 months ago
- Measures the latency between CPU cores☆1,250Updated last year
- A new (MLIR based) high-level IR for clang.☆522Updated last week
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,090Updated this week
- ☆294Updated last year
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆351Updated 4 months ago
- C++ template library for high performance SIMD based sorting algorithms☆958Updated 2 months ago
- Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library☆1,663Updated 4 months ago
- ☆55Updated last week
- GPU-accelerated compiler☆349Updated last year
- Implementations of SIMD instruction sets for systems which don't natively support them.☆2,764Updated last week