usstq / mm_amxLinks

matmul using AMX instructions

☆22

Alternatives and similar repositories for mm_amx

Users that are interested in mm_amx are comparing it to the libraries listed below

Sorting:

XpuOS / xsched
A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs
☆140Updated this week
ece-fast-lab / ASPLOS-2025-M5
This is the respository that holds the artifacts of ASPLOS'25 -- M5: Mastering Page Migration and Memory Management for CXL-based Tiered …
☆14Updated 8 months ago
open-neutrino / neutrino
☆212Updated 4 months ago
SJTU-IPADS / reef
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…
☆103Updated 2 years ago
eniac / paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆65Updated last year
WukLab / Mira
A Progam-Behavior-Guided Far Memory System
☆35Updated 2 years ago
lipracer / cuda-rt-hook
☆45Updated 4 months ago
ut-datasys / tigon
Tigon: A Distributed Database for a CXL Pod [OSDI '25]
☆38Updated last week
SJTU-IPADS / reef-artifacts
A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.
☆43Updated 3 years ago
SJTU-IPADS / PhoenixOS-Remoting
☆21Updated 4 months ago
tallendev / uvm-eval
This serves as a repository for reproducibility of the SC21 paper "In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated…
☆36Updated 2 years ago
nicexlab / GeminiFS
GeminiFS: A Companion File System for GPUs
☆66Updated 9 months ago
Sys-KU / AutoTiering
[USENIX ATC 2021] Exploring the Design Space of Page Management for Multi-Tiered Memory Systems
☆48Updated 3 years ago
utnslab / Medes
Deduplication over dis-aggregated memory for Serverless Computing
☆14Updated 3 years ago
csl-iisc / GPM-ASPLOS22
☆36Updated last year
ZaidQureshi / bam
☆203Updated last week
lingfenghsiang / Nomad
OSDI'24 Nomad implementation
☆54Updated 4 months ago
aoli-al / HFuse
Horizontal Fusion
☆24Updated 3 years ago
SJTU-IPADS / ugache
☆23Updated 2 years ago
thustorage / Medusa
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆40Updated 6 months ago
alibaba-edu / qwen-bailian-usagetraces-anon
☆61Updated last month
rkhan055 / SHADE
SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training
☆35Updated 2 years ago
OSU-STARLAB / UVM_benchmark
☆32Updated 5 years ago
HPMLL / NVIDIA-Hopper-Benchmark
☆66Updated 6 months ago
intel / DTO
A user level library for applications to transparently use Intel DSA.
☆38Updated 3 weeks ago
NEO-MLSys25 / NEO
NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading
☆69Updated 5 months ago
Sys-KU / DeepPlan
[ACM EuroSys 2023] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access
☆56Updated 4 months ago
sitar-lab / NeuSight
☆57Updated 5 months ago
mikeroyal / AMX-Guide
Advanced Matrix Extensions (AMX) Guide
☆106Updated 3 years ago
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆64Updated last year