corsix / amx
Apple AMX Instruction Set
☆1,078Updated 4 months ago
Alternatives and similar repositories for amx:
Users that are interested in amx are comparing it to the libraries listed below
- Apple G13 GPU architecture docs and tools☆585Updated last month
- Apple GPU microarchitecture☆519Updated 7 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆173Updated 6 months ago
- Apple Firestorm/Icestorm CPU microarchitecture docs☆239Updated last year
- Reverse engineered Linux driver for the Apple Neural Engine (ANE).☆409Updated last year
- Dissecting the M1's GPU for 3D acceleration☆1,004Updated 3 years ago
- ☆282Updated 4 months ago
- Everything we actually know about the Apple Neural Engine (ANE)☆2,196Updated 2 months ago
- Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.☆685Updated last week
- Nvidia Instruction Set Specification Generator☆260Updated 10 months ago
- ☆444Updated last month
- ☆1,038Updated 5 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆187Updated 2 months ago
- A new (MLIR based) high-level IR for clang.☆488Updated this week
- C++ template library for high performance SIMD based sorting algorithms☆929Updated last week
- ☆296Updated last year
- MLIR For Beginners tutorial☆964Updated 3 months ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆493Updated 2 years ago
- An introduction to ARM64 assembly on Apple Silicon Macs☆4,641Updated last month
- C/C++ frontend for MLIR. Also features polyhedral optimizations, parallel optimizations, and more!☆536Updated 7 months ago
- advanced compilers☆823Updated last week
- This repository contains high-performance implementations of memset and memcpy in assembly.☆329Updated 3 years ago
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆91Updated last month
- GPU-accelerated compiler☆344Updated last year
- Implementations of SIMD instruction sets for systems which don't natively support them.☆2,653Updated last month
- A benchmark for low-level CPU micro-architectural features☆719Updated 3 years ago
- Exocompilation for productive programming of hardware accelerators☆599Updated this week
- A superoptimizer for LLVM IR☆2,223Updated 8 months ago
- The fastest RISC-V sandbox☆858Updated last month
- FlashAttention (Metal Port)☆483Updated 7 months ago