corsix / amxLinks
Apple AMX Instruction Set
☆1,121Updated 7 months ago
Alternatives and similar repositories for amx
Users that are interested in amx are comparing it to the libraries listed below
Sorting:
- Apple G13 GPU architecture docs and tools☆597Updated 2 months ago
- Apple GPU microarchitecture☆533Updated 10 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆191Updated 8 months ago
- Apple Firestorm/Icestorm CPU microarchitecture docs☆242Updated 2 years ago
- Reverse engineered Linux driver for the Apple Neural Engine (ANE).☆416Updated last year
- ☆289Updated 7 months ago
- Everything we actually know about the Apple Neural Engine (ANE)☆2,242Updated 4 months ago
- Nvidia Instruction Set Specification Generator☆285Updated last year
- Dissecting the M1's GPU for 3D acceleration☆1,007Updated 3 years ago
- ☆449Updated 3 months ago
- ☆1,044Updated 2 months ago
- Exocompilation for productive programming of hardware accelerators☆649Updated last week
- Running linear algebra as fast as possible on Apple silicon☆21Updated last year
- GPUOcelot: A dynamic compilation framework for PTX☆204Updated 5 months ago
- Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.☆709Updated this week
- MLIR For Beginners tutorial☆1,029Updated 2 weeks ago
- Sniff CUDA ioctls☆205Updated 2 years ago
- A new (MLIR based) high-level IR for clang.☆516Updated this week
- Measures the latency between CPU cores☆1,242Updated 11 months ago
- Solve Puzzles. Learn Metal 🤘☆574Updated 10 months ago
- Circuit IR Compilers and Tools☆1,868Updated this week
- C/C++ frontend for MLIR. Also features polyhedral optimizations, parallel optimizations, and more!☆566Updated last month
- BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.☆35Updated 2 years ago
- nsync is a C library that exports various synchronization primitives, such as mutexes☆1,183Updated 3 months ago
- ☆2,426Updated 2 months ago
- The fastest RISC-V sandbox☆901Updated this week
- Measures microarchitectural details such as ROB size. Like https://github.com/travisdowns/robsize but without runtime code generation, wh…☆129Updated 4 years ago
- Optimized implementations of various library functions for ARM architecture processors☆642Updated last week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆523Updated 2 years ago
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,022Updated this week