corsix / amxLinks
Apple AMX Instruction Set
☆1,173Updated 11 months ago
Alternatives and similar repositories for amx
Users that are interested in amx are comparing it to the libraries listed below
Sorting:
- Apple G13 GPU architecture docs and tools☆628Updated 6 months ago
- Apple GPU microarchitecture☆566Updated last year
- Exploring the scalable matrix extension of the Apple M4 processor☆211Updated last year
- Reverse engineered Linux driver for the Apple Neural Engine (ANE).☆442Updated last year
- Apple Firestorm/Icestorm CPU microarchitecture docs☆246Updated 2 years ago
- ☆308Updated 2 months ago
- Nvidia Instruction Set Specification Generator☆301Updated last year
- Everything we actually know about the Apple Neural Engine (ANE)☆2,332Updated last month
- ☆449Updated 8 months ago
- Dissecting the M1's GPU for 3D acceleration☆1,014Updated 3 years ago
- Reverse engineering Rosetta 2 on M1 Mac☆421Updated 4 years ago
- Sniff CUDA ioctls☆215Updated 2 years ago
- ☆1,067Updated 6 months ago
- Exocompilation for productive programming of hardware accelerators☆690Updated this week
- Kernel extension that enables TSO for Apple silicon processes☆263Updated 2 years ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆368Updated 7 months ago
- Running linear algebra as fast as possible on Apple silicon☆27Updated 2 years ago
- BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.☆36Updated 2 years ago
- GPUOcelot: A dynamic compilation framework for PTX☆217Updated 10 months ago
- The fastest RISC-V sandbox☆972Updated last month
- Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.☆734Updated this week
- ☆295Updated last year
- MLIR For Beginners tutorial☆1,155Updated 4 months ago
- nsync is a C library that exports various synchronization primitives, such as mutexes☆1,235Updated last month
- Solve Puzzles. Learn Metal 🤘☆593Updated last year
- FlashAttention (Metal Port)☆560Updated last year
- Measures the latency between CPU cores☆1,288Updated last year
- ☆1,505Updated 3 years ago
- A new (MLIR based) high-level IR for clang.☆560Updated this week
- This repository contains high-performance implementations of memset and memcpy in assembly.☆337Updated 3 years ago