TiledTensor/TiledLower

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TiledTensor/TiledLower)

TiledTensor / TiledLower

TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.

☆13

Alternatives and similar repositories for TiledLower

Users that are interested in TiledLower are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TiledTensor / TiledKernel
View on GitHub
TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.
☆19May 12, 2024Updated 2 years ago
TiledTensor / TiledCUDA
View on GitHub
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆192Jan 28, 2025Updated last year
microsoft / FractalTensor
View on GitHub
FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …
☆32Dec 21, 2024Updated last year
nicolaswilde / amx-gemm-handwritten
View on GitHub
Handwritten GEMM using Intel AMX (Advanced Matrix Extension)
☆17Jan 11, 2025Updated last year
LeiWang1999 / Stream-k.tvm
View on GitHub
☆20Sep 28, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
tsinghua-ideal / Syno
View on GitHub
Source code repository for ASPLOS '25 paper "Syno: Structured Synthesis for Neural Operators"
☆15Aug 31, 2025Updated 10 months ago
ChandlerGuan / kperfir_artifact
View on GitHub
☆19May 9, 2025Updated last year
KuangjuX / TileGraph
View on GitHub
TileGraph is an experimental DNN compiler that utilizes static code generation and kernel fusion techniques.
☆11Sep 18, 2024Updated last year
KuangjuX / Paper-reading
View on GitHub
My Paper Reading Lists and Notes.
☆25May 8, 2026Updated 2 months ago
dame-cell / Triformer
View on GitHub
Transformers components but in Triton
☆34May 9, 2025Updated last year
VivekPanyam / cudaparsers
View on GitHub
Parsers for CUDA binary files
☆25Dec 29, 2023Updated 2 years ago
KuangjuX / cu-x
View on GitHub
🎉My Collections of CUDA Kernels~
☆11Jun 25, 2024Updated 2 years ago
triton-lang / kernels
View on GitHub
☆115Mar 12, 2026Updated 4 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
AlibabaResearch / mononn
View on GitHub
☆32Jul 17, 2024Updated 2 years ago
IBM / triton-dejavu
View on GitHub
Framework to reduce autotune overhead to zero for well known deployments.
☆101Sep 19, 2025Updated 10 months ago
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆86May 5, 2025Updated last year
microsoft / cusync
View on GitHub
☆27Feb 20, 2024Updated 2 years ago
LeiWang1999 / AutoGPTQ.tvm
View on GitHub
GPTQ inference TVM kernel
☆41Apr 25, 2024Updated 2 years ago
cornell-brg / torng-uecgra-scripts-hpca2021
View on GitHub
☆12Aug 4, 2022Updated 3 years ago
monellz / FlashTensor
View on GitHub
☆19Mar 4, 2025Updated last year
microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆115Jun 28, 2025Updated last year
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
YJMSTR / flash-linear-attention
View on GitHub
FLA but cuTile
☆27Apr 17, 2026Updated 3 months ago
GeeeekExplorer / kkbot
View on GitHub
A Feishu/Lark AI agent bot
☆15Feb 27, 2026Updated 4 months ago
Dao-AILab / fast-hadamard-transform
View on GitHub
Fast Hadamard transform in CUDA, with a PyTorch interface
☆338Mar 10, 2026Updated 4 months ago
leepoly / sm-profiler
View on GitHub
☆82Feb 5, 2026Updated 5 months ago
KnowingNothing / MatmulTutorial
View on GitHub
A Easy-to-understand TensorOp Matmul Tutorial
☆445Mar 5, 2026Updated 4 months ago
illinois-impact / klap
View on GitHub
A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches
☆15Jun 21, 2019Updated 7 years ago
josehu07 / summerset
View on GitHub
Distributed, Replicated, Protocol-generic Key-value Store in Async Rust for SMR Protocols Research
☆18Jul 5, 2026Updated 2 weeks ago
dsl-learn / cutile-learn
View on GitHub
NVIDIA cuTile learn
☆169Dec 9, 2025Updated 7 months ago
xinhaoc / ferret
View on GitHub
Autonomous CUDA kernel optimization agent with structured task specs and per-config scoring
☆17Jun 17, 2026Updated last month
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
ZipCPU / qoiimg
View on GitHub
Quite OK image compression Verilog implementation
☆23Nov 27, 2024Updated last year
flashinfer-ai / debug-print
View on GitHub
Debug print operator for cudagraph debugging
☆18Aug 2, 2024Updated last year
CMU-SAFARI / DRAM-Datasheet-Survey
View on GitHub
A survey of manufacturer-provided DRAM operating parameters and timings as specified by DRAM chip datasheets from between 1970 and 2021. …
☆11May 4, 2022Updated 4 years ago
pku-liang / MAGIS
View on GitHub
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆57May 29, 2024Updated 2 years ago
Huangxy-Minel / System-Design-for-Federated-Learning
View on GitHub
Paper list of federated learning: About system design
☆13Apr 13, 2022Updated 4 years ago
microsoft / triton-shared
View on GitHub
Shared Middle-Layer for Triton Compilation
☆340Dec 5, 2025Updated 7 months ago
eth-cscs / Tiled-MM
View on GitHub
Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.
☆33Apr 2, 2025Updated last year