Exploring the scalable matrix extension of the Apple M4 processor
☆231Nov 7, 2024Updated last year
Alternatives and similar repositories for m4-sme-exploration
Users that are interested in m4-sme-exploration are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Apple AMX Instruction Set☆1,225Dec 26, 2024Updated last year
- CPU micro benchmarks☆80Apr 14, 2026Updated last month
- Apple GPU microarchitecture☆606Sep 22, 2024Updated last year
- ☆54Oct 31, 2021Updated 4 years ago
- Stub for polymorphic code☆11Mar 18, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.☆36Jan 7, 2023Updated 3 years ago
- Investigation into replacing the MES compiler☆31May 6, 2026Updated last week
- FP64 equivalent GEMM by the Ozaki scheme with Int8 Tensor Cores☆120Dec 2, 2025Updated 5 months ago
- ☆82Oct 29, 2024Updated last year
- An easy-to-use and fast library for task-based parallelism, utilizing coroutines.☆332Sep 13, 2024Updated last year
- x64 assembler library in C☆26Oct 5, 2020Updated 5 years ago
- ☆316May 13, 2026Updated last week
- Running linear algebra as fast as possible on Apple silicon☆29Aug 18, 2023Updated 2 years ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆109May 13, 2026Updated last week
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Interactive GUI Snowfall Simulation Created in C & Raylib☆23Dec 24, 2025Updated 4 months ago
- A Throughput-Optimized Pipeline Parallel Inference System for Large Language Models☆50Dec 24, 2025Updated 4 months ago
- ☆11Sep 11, 2023Updated 2 years ago
- Nvidia Instruction Set Specification Generator☆323Jul 9, 2024Updated last year
- Reproducibility package for "Robust Join Processing with Diamond Hardened Joins"☆12Jul 10, 2024Updated last year
- experimental cooperative threading library for embedded ARM in pure C☆20Aug 18, 2021Updated 4 years ago
- The University of Bristol HPC Simulation Engine☆109Aug 30, 2025Updated 8 months ago
- Trying to figure various CPU things out☆167Jan 31, 2026Updated 3 months ago
- ☆10Nov 14, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆13Jan 3, 2026Updated 4 months ago
- A house for your mouse so you always know where it's at☆21Dec 2, 2025Updated 5 months ago
- x86-64, ARM, and RVV intrinsics viewer☆79Feb 15, 2026Updated 3 months ago
- Interpreter and compiler for the ISA specification language "Architecture Specification Language" (ASL)☆29Apr 30, 2026Updated 2 weeks ago
- GEMMul8 (GEMMulate): GEMM emulation using INT8/FP8 matrix engines based on the Ozaki Scheme II☆69Apr 6, 2026Updated last month
- Pass Rust strings to C with potentially not needing heap allocation☆12Jan 25, 2026Updated 3 months ago
- Prototype Rust implementation of hash-based signatures. See https://eprint.iacr.org/2025/055.pdf☆55Dec 18, 2025Updated 5 months ago
- ☆16Dec 11, 2024Updated last year
- Binary Ninja Plugin for RISC-V☆15Nov 29, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- An LLVM IR dataset for data-driven compiler optimization research☆79Mar 17, 2026Updated 2 months ago
- ARMv8 SHA256 instructions implementation in C☆30Apr 28, 2016Updated 10 years ago
- Turn your Apple Watch into an ammeter to measure DC currents☆195Sep 15, 2024Updated last year
- ☆56May 13, 2026Updated last week
- Zero allocation macros for retrieving multiple mutable indices from a mutable slice safely.☆15Jul 21, 2024Updated last year
- ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines (FPGA 2025 Best Paper Nominee)☆62Mar 8, 2026Updated 2 months ago