corsix / amx
Apple AMX Instruction Set
☆1,058Updated 2 months ago
Alternatives and similar repositories for amx:
Users that are interested in amx are comparing it to the libraries listed below
- Apple G13 GPU architecture docs and tools☆579Updated this week
- Exploring the scalable matrix extension of the Apple M4 processor☆168Updated 4 months ago
- Apple GPU microarchitecture☆504Updated 6 months ago
- Reverse engineered Linux driver for the Apple Neural Engine (ANE).☆399Updated last year
- Apple Firestorm/Icestorm CPU microarchitecture docs☆237Updated last year
- Reverse engineering Rosetta 2 on M1 Mac☆394Updated 3 years ago
- Dissecting the M1's GPU for 3D acceleration☆1,002Updated 2 years ago
- Nvidia Instruction Set Specification Generator☆253Updated 8 months ago
- Everything we actually know about the Apple Neural Engine (ANE)☆2,177Updated 2 weeks ago
- An introduction to ARM64 assembly on Apple Silicon Macs☆4,591Updated 4 months ago
- Sniff CUDA ioctls☆190Updated last year
- C++ template library for high performance SIMD based sorting algorithms☆924Updated last week
- ☆437Updated last week
- nsync is a C library that exports various synchronization primitives, such as mutexes☆1,144Updated 8 months ago
- The RISC-V Virtual Machine☆1,015Updated this week
- Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.☆670Updated this week
- A new (MLIR based) high-level IR for clang.☆466Updated this week
- FlashAttention (Metal Port)☆457Updated 6 months ago
- ☆295Updated 11 months ago
- Implementations of SIMD instruction sets for systems which don't natively support them.☆2,599Updated last week
- C/C++ frontend for MLIR. Also features polyhedral optimizations, parallel optimizations, and more!☆524Updated 5 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆181Updated last month
- MLIR For Beginners tutorial☆930Updated last month
- A tool for running small microbenchmarks on recent Intel and AMD x86 CPUs.☆459Updated 2 weeks ago
- Exocompilation for productive programming of hardware accelerators☆569Updated this week
- A superoptimizer for LLVM IR☆2,211Updated 6 months ago
- throwaway GPT inference☆140Updated 9 months ago
- Automatic verification of LLVM optimizations☆877Updated this week
- ctypes wrappers for HIP, CUDA, and OpenCL☆129Updated 8 months ago
- This repository contains high-performance implementations of memset and memcpy in assembly.☆320Updated 3 years ago