wentasah / mmul-anim
Visualization of cache-optimized matrix multiplication
☆45Updated 5 years ago
Related projects: ⓘ
- Tenstorrent TT-BUDA Repository☆204Updated last week
- Tenstorrent MLIR compiler☆52Updated this week
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆399Updated this week
- A minimal Tensor Processing Unit (TPU) inspired by Google's TPUv1.☆110Updated last month
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆88Updated 11 months ago
- CUDA Learning guide☆203Updated 3 months ago
- NVIDIA tools guide☆60Updated last month
- Nvidia Instruction Set Specification Generator☆211Updated 2 months ago
- ☆124Updated last week
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆190Updated last week
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆152Updated 11 months ago
- Fast CUDA matrix multiplication from scratch☆423Updated 8 months ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆43Updated this week
- A Data-Centric Compiler for Machine Learning☆81Updated 8 months ago
- CUDA Matrix Multiplication Optimization☆118Updated 2 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆136Updated 3 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆250Updated this week
- UNet diffusion model in pure CUDA☆562Updated 2 months ago
- ☆57Updated last month
- Alex Krizhevsky's original code from Google Code☆185Updated 8 years ago
- LLM training in simple, raw C/CUDA☆79Updated 4 months ago
- Slides, notes, and materials for the workshop☆297Updated 3 months ago
- Tenstorrent Kernel Module☆30Updated 2 weeks ago
- Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …☆50Updated 2 weeks ago
- Buda Compiler Backend for Tenstorrent devices☆20Updated last week
- ☆66Updated this week
- ☆124Updated 7 months ago
- An MLIR-based toolchain for AMD AI Engine-enabled devices.☆283Updated this week
- News and Paper Collections for Machine Learning Hardware☆20Updated 4 months ago
- IREE plugin repository for the AMD AIE accelerator☆63Updated this week