JimyMa / FuncTsLinks

[DAC2024] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning

☆15

Alternatives and similar repositories for FuncTs

Users that are interested in FuncTs are comparing it to the libraries listed below

Sorting:

zhaiyi000 / tlm
☆45Updated last year
chenhongyu2048 / LLM-inference-optimization-paper
Summary of some awesome work for optimizing LLM inference
☆134Updated last week
SJTU-ReArch-Group / Paper-Reading-List
☆131Updated 2 weeks ago
apache / tvm-ffi
Open ABI and FFI for Machine Learning Systems
☆152Updated last week
tile-ai / tilescale
Tile-based language built for AI computation across all scales
☆74Updated this week
xinhao-luo / ClusterFusion
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
☆47Updated last month
infinigence / SpecEE
Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)
☆66Updated 6 months ago
zartbot / shallowsim
DeepSeek-V3/R1 inference performance simulator
☆170Updated 7 months ago
sunkx109 / GPUs-Specs
Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM
☆63Updated 2 months ago
toyaix / triton-runner
Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.
☆76Updated last week
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆140Updated 9 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆183Updated last month
LoongServe / LoongServe
☆124Updated 11 months ago
KnowingNothing / MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
☆389Updated last month
flashinfer-ai / flashinfer-bench
Building the Virtuous Cycle for AI-driven LLM Systems
☆82Updated this week
XPU-Forces / xpu_graph
A torch compile backend for multi-targets
☆40Updated this week
interestingLSY / swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆285Updated 5 months ago
monellz / FlashTensor
☆16Updated 8 months ago
snu-comparch / InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆157Updated last year
MeshInfra / WaferLLM
WaferLLM: Large Language Model Inference at Wafer Scale
☆65Updated last week
pku-liang / MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆55Updated last year
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆118Updated last month
abhibambhaniya / GenZ-LLM-Analyzer
LLM Inference analyzer for different hardware platforms
☆94Updated 4 months ago
Cambricon / triton-linalg
Development repository for the Triton-Linalg conversion
☆204Updated 9 months ago
zhen8838 / handson-polyhedral
tutorials about polyhedral compilation.
☆55Updated 2 weeks ago
ColfaxResearch / cfx-article-src
☆154Updated 6 months ago
DD-DuDa / Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
☆246Updated 4 months ago
PrincetonUniversity / LLMCompass
☆199Updated 2 weeks ago
ArthurinRUC / cutlass-notes
From Minimal GEMM to Everything
☆67Updated last month
DeepLink-org / DLOP-Bench
A benchmark suited especially for deep learning operators
☆42Updated 2 years ago