corsix / amxLinks
Apple AMX Instruction Set
☆1,152Updated 9 months ago
Alternatives and similar repositories for amx
Users that are interested in amx are comparing it to the libraries listed below
Sorting:
- Apple G13 GPU architecture docs and tools☆616Updated 4 months ago
- Apple GPU microarchitecture☆550Updated last year
- Exploring the scalable matrix extension of the Apple M4 processor☆204Updated 11 months ago
- Reverse engineered Linux driver for the Apple Neural Engine (ANE).☆424Updated last year
- Apple Firestorm/Icestorm CPU microarchitecture docs☆243Updated 2 years ago
- ☆299Updated last week
- Dissecting the M1's GPU for 3D acceleration☆1,012Updated 3 years ago
- ☆449Updated 6 months ago
- Nvidia Instruction Set Specification Generator☆293Updated last year
- Everything we actually know about the Apple Neural Engine (ANE)☆2,261Updated 7 months ago
- ☆1,055Updated 4 months ago
- nsync is a C library that exports various synchronization primitives, such as mutexes☆1,214Updated 3 weeks ago
- Exocompilation for productive programming of hardware accelerators☆667Updated this week
- The fastest RISC-V sandbox☆935Updated last month
- MLIR For Beginners tutorial☆1,092Updated 2 months ago
- Solve Puzzles. Learn Metal 🤘☆587Updated last year
- GPUOcelot: A dynamic compilation framework for PTX☆209Updated 7 months ago
- Sniff CUDA ioctls☆211Updated 2 years ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆355Updated 5 months ago
- FlashAttention (Metal Port)☆538Updated last year
- The RISC-V Virtual Machine☆1,125Updated this week
- ☆293Updated last year
- Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.☆726Updated 2 weeks ago
- An introduction to ARM64 assembly on Apple Silicon Macs☆4,809Updated 6 months ago
- Running linear algebra as fast as possible on Apple silicon☆22Updated 2 years ago
- BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.☆35Updated 2 years ago
- This repository contains high-performance implementations of memset and memcpy in assembly.☆335Updated 3 years ago
- A new (MLIR based) high-level IR for clang.☆540Updated last week
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library☆1,680Updated 5 months ago