gautam1858/tiny-gpu-compiler

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/gautam1858/tiny-gpu-compiler)

gautam1858 / tiny-gpu-compiler

An MLIR-based compiler that takes GPU kernels and compiles them to real hardware instructions. Interactive web visualizer included.

☆139

Alternatives and similar repositories for tiny-gpu-compiler

Users that are interested in tiny-gpu-compiler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

adamtiger / tinyGPUlang
View on GitHub
Tutorial on building a gpu compiler backend in LLVM
☆62Jan 11, 2025Updated last year
boopdotpng / blackhole-py
View on GitHub
python driver and runtime for tenstorrent blackhole cards
☆16Updated this week
dataflowr / llm_efficiency
View on GitHub
KV Cache & LoRA for minGPT
☆61Mar 4, 2026Updated 4 months ago
sakshambatra1 / microMLC
View on GitHub
minimal compiler
☆24Feb 19, 2026Updated 5 months ago
ROCm / roc-optiq
View on GitHub
A visualizer for the ROCm Profiler Tools
☆24Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Multi-V-VM / DoubleJIT-VM
View on GitHub
A double JIT VM
☆22Jul 9, 2026Updated last week
SwaggasDeCatas / emuThreeDS
View on GitHub
World's first Nintendo 3DS emulator for Apple devices based on Citra.
☆18Apr 7, 2023Updated 3 years ago
mk1-project / quickreduce
View on GitHub
QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.
☆38Aug 29, 2025Updated 10 months ago
HamzaElshafie / h100_gemm
View on GitHub
A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.
☆79Feb 18, 2026Updated 5 months ago
PacktPublishing / LLVM-Code-Generation
View on GitHub
LLVM Code Generation, published by Packt
☆272May 14, 2026Updated 2 months ago
DavidGinten / ML-compiler-exercise
View on GitHub
An online tutorial to make MLIR more beginner friendly with an end-to-end deep learning compiler pipeline
☆55Jun 8, 2026Updated last month
ROCm / FlyDSL
View on GitHub
FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.
☆237Updated this week
rafasumi / mlir-tutorial
View on GitHub
SBLP 2025 MLIR Tutorial
☆75Mar 25, 2026Updated 3 months ago
tiny-tpu-v2 / tiny-tpu
View on GitHub
A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1
☆1,349Apr 3, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
florianmattana / sass-king
View on GitHub
Reverse engineering NVIDIA SASS instruction dictionary, kernel audits and pattern recognition across GPU architectures.
☆311May 18, 2026Updated 2 months ago
j4orz / ateenysitp
View on GitHub
a whirlwind tour to deep learning and deep learning systems
☆81Updated this week
Zaneham / Moa
View on GitHub
Monte Carlo neutron transport in C99. GPU via Booth (AMD MI300X, NVIDIA RTX). ENDF/B-VII.1 nuclear data. Validated against ICSBEP benchma…
☆15Jun 5, 2026Updated last month
Zaneham / Booth
View on GitHub
Open-source CUDA, Triton and HIP compiler targeting multiple GPU and CPU architectures.
☆1,717Updated this week
JackonYang / hands-on-tvm
View on GitHub
hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.
☆51Jun 15, 2023Updated 3 years ago
Sanjeen1 / VSD-workshop-on-7nm-finfet-characterization-
View on GitHub
This repository contains my work from the VLSI System Design (VSD) Workshop on 7nm FinFET Circuit Design and Characterization using the A…
☆16Sep 9, 2025Updated 10 months ago
patrick-toulme / pyptx
View on GitHub
A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch
☆367Jul 9, 2026Updated last week
Groverkss / mlir-tutor
View on GitHub
Exercises for Learning MLIR (Originally written for PPoPP 2026)
☆106Feb 5, 2026Updated 5 months ago
modular / max-llm-book
View on GitHub
Build an LLM from scratch with MAX
☆64Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
facebookresearch / tensor-layouts
View on GitHub
A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.
☆231Jun 29, 2026Updated 3 weeks ago
NicolaLancellotti / cherry
View on GitHub
🍒 Cherry programming language
☆17Sep 18, 2024Updated last year
nirw4nna / dsc
View on GitHub
Tensor library & inference framework for machine learning
☆118Oct 3, 2025Updated 9 months ago
wafer-ai / gpu-perf-engineering-resources
View on GitHub
A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do
☆1,256Apr 27, 2026Updated 2 months ago
qualcomm / hexagon-mlir
View on GitHub
Hexagon-MLIR is a compiler toolchain for compiling and executing AI kernels and models on Qualcomm Hexagon Neural Processing Units (NPUs)…
☆177Jul 2, 2026Updated 2 weeks ago
BenChaliah / Tensa-Lang
View on GitHub
TensaLang is a Tensor-first programming language, compiler, and runtime that let you write the Model’s inference engine (e.g. LLMs) and s…
☆77Feb 20, 2026Updated 5 months ago
gpu-mode / pygpubench
View on GitHub
GPU kernel benchmarking
☆47Jun 10, 2026Updated last month
hkproj / multi-latent-attention
View on GitHub
☆46May 24, 2025Updated last year
HazyResearch / HipKittens
View on GitHub
Fast and Furious AMD Kernels
☆444Jul 10, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
avpatel / kvmtool
View on GitHub
Native Linux KVM tool
☆15May 18, 2026Updated 2 months ago
wjoeyzhewei / Alpha64_R10000_Superscalar_Processor
View on GitHub
Alpha64 R10000 Two-Way Superscalar Processor
☆12May 6, 2019Updated 7 years ago
ravikumar1907 / llm-ebpf-tracer
View on GitHub
☆27Jun 5, 2025Updated last year
chrinovicmu / relm
View on GitHub
Linux kernel Type-1 hypervisor with modular VMX, SVM, ARM EL2, and RISC-V H support
☆27Updated this week
sdiehl / mlir-egglog
View on GitHub
A toy compiler for NumPy array expressions that uses e-graphs and MLIR
☆122Jul 13, 2026Updated last week
uttamcoomar / NN_Digits
View on GitHub
☆30Mar 22, 2026Updated 3 months ago
ezyang / cute-interactive
View on GitHub
Interactive version of the CuTe layout paper
☆57Apr 14, 2026Updated 3 months ago