monellz/FlashTensor

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/monellz/FlashTensor)

monellz / FlashTensor

☆19

Alternatives and similar repositories for FlashTensor

Users that are interested in FlashTensor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AlibabaResearch / mononn
View on GitHub
☆32Jul 17, 2024Updated 2 years ago
xinhao-luo / ClusterFusion
View on GitHub
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
☆75Dec 11, 2025Updated 7 months ago
ChandlerGuan / kperfir_artifact
View on GitHub
☆19May 9, 2025Updated last year
ChijinZ / PolyJuice-Fuzzer
View on GitHub
A DL compiler fuzzer
☆15Nov 1, 2024Updated last year
sjtu-epcc / Tacker
View on GitHub
Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
☆33Feb 10, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
illinois-impact / klap
View on GitHub
A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches
☆15Jun 21, 2019Updated 7 years ago
aoli-al / HFuse
View on GitHub
Horizontal Fusion
☆24Jan 7, 2022Updated 4 years ago
tsinghua-ideal / Syno
View on GitHub
Source code repository for ASPLOS '25 paper "Syno: Structured Synthesis for Neural Operators"
☆15Aug 31, 2025Updated 10 months ago
tile-ai / tvm
View on GitHub
Open deep learning compiler stack for cpu, gpu and specialized accelerators
☆19Jul 13, 2026Updated last week
alibaba / redfuser
View on GitHub
☆21Mar 17, 2026Updated 4 months ago
leloykun / steepest-descent-lean
View on GitHub
Deriving steepest descent convergence bounds and hyperparameter scaling laws in machine learning optimization from first principles, form…
☆16Apr 11, 2026Updated 3 months ago
Zhaoshixin-sky / CIM-MLC
View on GitHub
[ASPLOS 2024] CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators
☆47May 25, 2024Updated 2 years ago
wudu98 / autoGEMM
View on GitHub
☆15Dec 5, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
pku-liang / MAGIS
View on GitHub
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆57May 29, 2024Updated 2 years ago
zhaiyi000 / tlm
View on GitHub
☆49Jul 13, 2024Updated 2 years ago
alibaba / llm-scheduling-artifact
View on GitHub
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆64Jun 5, 2024Updated 2 years ago
summerspringwei / souffle-ae
View on GitHub
☆17Jan 24, 2024Updated 2 years ago
SJTU-ReArch-Group / Paper-Reading-List
View on GitHub
☆154Jun 17, 2026Updated last month
wangxy-2000 / pimsim-nn
View on GitHub
☆64Feb 29, 2024Updated 2 years ago
uiuc-arc / neptune
View on GitHub
☆27Jun 18, 2026Updated last month
triton-lang / Triton-to-tile-IR
View on GitHub
incubator repo for CUDA-TileIR backend
☆148Jul 10, 2026Updated last week
mlc-ai / mlc-python
View on GitHub
☆36Jul 19, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
adam-smnk / Open-CIM-Compiler
View on GitHub
☆34Jun 7, 2021Updated 5 years ago
ChandlerGuan / mercury_artifact
View on GitHub
☆27Oct 1, 2025Updated 9 months ago
microsoft / cusync
View on GitHub
☆27Feb 20, 2024Updated 2 years ago
LeiWang1999 / Stream-k.tvm
View on GitHub
☆20Sep 28, 2024Updated last year
TiledTensor / TiledCUDA
View on GitHub
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆192Jan 28, 2025Updated last year
escalab / RTSpMSpM
View on GitHub
☆25Apr 13, 2025Updated last year
tud-ccc / Cinnamon
View on GitHub
☆45Updated this week
IBM / triton-dejavu
View on GitHub
Framework to reduce autotune overhead to zero for well known deployments.
☆101Sep 19, 2025Updated 10 months ago
pku-liang / AMOS
View on GitHub
Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators
☆125Oct 26, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ParCIS / FlashSparse
View on GitHub
FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swa…
☆39Oct 5, 2025Updated 9 months ago
merrymercy / Awesome-Efficient-LLM
View on GitHub
A curated list for Efficient Large Language Models
☆11Mar 25, 2024Updated 2 years ago
uwsampl / SparseTIR
View on GitHub
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆145Mar 31, 2023Updated 3 years ago
triton-lang / triton-ext
View on GitHub
A collection of out-of-tree extensions for the Triton language and compiler
☆30Jul 13, 2026Updated last week
khaki3 / ptxas-wrapper
View on GitHub
A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code
☆16Mar 19, 2023Updated 3 years ago
microsoft / FractalTensor
View on GitHub
FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …
☆32Dec 21, 2024Updated last year
horizon-research / imagen
View on GitHub
☆10Mar 8, 2025Updated last year