alexarmbr / matmul-playgroundView external linksLinks
☆24Apr 7, 2025Updated 10 months ago
Alternatives and similar repositories for matmul-playground
Users that are interested in matmul-playground are comparing it to the libraries listed below
Sorting:
- hadoop 的 docker 集群配置☆11Jun 8, 2024Updated last year
- Hex encode & decode a string, right from your terminal.☆10Jan 5, 2023Updated 3 years ago
- Implementaion of Generic L-layer Neural Network from Scratch☆12May 14, 2018Updated 7 years ago
- ☆10Sep 3, 2021Updated 4 years ago
- introduction to dataflow analysis using julia☆14Oct 26, 2020Updated 5 years ago
- ☆13Sep 2, 2021Updated 4 years ago
- 给llvm17.0.6添加一个新后端Cpu0☆12Apr 22, 2024Updated last year
- Writeup that goes along with this:☆41Jan 18, 2018Updated 8 years ago
- ☆10Mar 3, 2024Updated last year
- ☆16Sep 7, 2025Updated 5 months ago
- This repo contains the Assignments from Cornell Tech's ECE 5545 - Machine Learning Hardware and Systems offered in Spring 2023☆42May 31, 2023Updated 2 years ago
- An MLIR-based compiler from C/C++ to AMD-Xilinx Versal AIE☆18Aug 5, 2022Updated 3 years ago
- ☆10Jul 22, 2020Updated 5 years ago
- Framework for Algorithmic Correctness Testing of Operators☆16Updated this week
- Research code and scripts used in the Silburt et al. (2021) EMNLP 2021 paper 'FANATIC: FAst Noise-Aware TopIc Clustering'☆11Jul 6, 2023Updated 2 years ago
- High-Performance FP32 GEMM on CUDA devices☆117Jan 21, 2025Updated last year
- ☆11Mar 20, 2023Updated 2 years ago
- ☆18Nov 11, 2025Updated 3 months ago
- Utilities for ROCm Tech Support Log Collections☆13Nov 21, 2025Updated 2 months ago
- Standalone commandline CLI tool for compiling Triton kernels☆20Sep 13, 2024Updated last year
- 32-bit integer only RISC-V core, along with assembler, linker, and compiler from scratch☆23Sep 21, 2025Updated 4 months ago
- libForBES is a C++ solver for generic, constrained and possibly nonsmooth convex optimization problems. LASSO, optimal control, elastic n…☆10Apr 11, 2017Updated 8 years ago
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆104Sep 24, 2025Updated 4 months ago
- Qwen3-0.6B megakernel: 527 tok/s decode on RTX 3090 (3.8x faster than PyTorch)☆70Updated this week
- New York Times best sellers list with Google Books API☆14Dec 13, 2017Updated 8 years ago
- Content Addressable Memory using dimensionality reduction☆13Apr 22, 2017Updated 8 years ago
- 📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).☆64Apr 26, 2025Updated 9 months ago
- tutorials about polyhedral compilation.☆62Feb 9, 2026Updated last week
- Source code repository accompanying the scientific paper "Finding Efficient Spatial Distributions for Massively Instanced 3-d Models" (S.…☆16Apr 16, 2020Updated 5 years ago
- Xilinx Modifications to Halide☆14May 3, 2021Updated 4 years ago
- Supporting material for the book club☆15Jul 24, 2022Updated 3 years ago
- Generate versal system design from ONNX model. AI engine kernels. Sub-microsecond speeds for autoencoders.☆16Dec 29, 2024Updated last year
- A fast full-system simulator of Tenstorrent hardware☆43Feb 6, 2026Updated last week
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆44Updated this week
- libevent based multi-threaded web server☆19Apr 3, 2016Updated 9 years ago
- Because it's there.☆16Sep 22, 2024Updated last year
- An alternative Vivado custom design example (to fully Vitis) for the User Logic Partition targeting VCK5000☆13Jul 16, 2024Updated last year
- User documentation on the usage of LUMI resources☆20Updated this week
- ☆18Jan 16, 2026Updated last month