corsix / amx
Apple AMX Instruction Set
☆1,067Updated 3 months ago
Alternatives and similar repositories for amx:
Users that are interested in amx are comparing it to the libraries listed below
- Apple G13 GPU architecture docs and tools☆582Updated 3 weeks ago
- Apple GPU microarchitecture☆511Updated 6 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆171Updated 5 months ago
- Apple Firestorm/Icestorm CPU microarchitecture docs☆238Updated last year
- Nvidia Instruction Set Specification Generator☆253Updated 9 months ago
- ☆281Updated 3 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆185Updated 2 months ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆480Updated last year
- ☆441Updated last week
- ☆296Updated 11 months ago
- Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.☆673Updated this week
- ☆1,034Updated 4 months ago
- MLIR For Beginners tutorial☆950Updated 2 months ago
- Sniff CUDA ioctls☆192Updated last year
- FlashAttention (Metal Port)☆476Updated 6 months ago
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated 9 months ago
- Everything we actually know about the Apple Neural Engine (ANE)☆2,188Updated last month
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆209Updated last year
- C/C++ frontend for MLIR. Also features polyhedral optimizations, parallel optimizations, and more!☆534Updated 6 months ago
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆83Updated last month
- Assembler for NVIDIA Maxwell architecture☆990Updated 2 years ago
- advanced compilers☆814Updated 3 weeks ago
- GPU-accelerated compiler☆342Updated last year
- The RISC-V Virtual Machine☆1,034Updated last week
- A new (MLIR based) high-level IR for clang.☆482Updated this week
- VUDA is a header-only library based on Vulkan that provides a CUDA Runtime API interface for writing GPU-accelerated applications.☆888Updated last year
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆1,314Updated 2 weeks ago
- Exocompilation for productive programming of hardware accelerators☆595Updated last week
- VSCode LLVM Compiler Explorer☆229Updated 10 months ago
- nsync is a C library that exports various synchronization primitives, such as mutexes☆1,157Updated last week