☆33Jul 17, 2024Updated last year
Alternatives and similar repositories for mononn
Users that are interested in mononn are comparing it to the libraries listed below
Sorting:
- An Optimizing Compiler for Recommendation Model Inference☆26Jun 5, 2025Updated 8 months ago
- A recommendation model kernel optimizing system☆12Jun 5, 2025Updated 8 months ago
- ☆18Mar 4, 2025Updated 11 months ago
- libsmctrl论文的复现,添加了python端接口,可以在python端灵活调用接口来分配计算资源☆12May 21, 2024Updated last year
- Horizontal Fusion☆24Jan 7, 2022Updated 4 years ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Nov 23, 2024Updated last year
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Feb 24, 2026Updated last week
- ☆17Jan 24, 2024Updated 2 years ago
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆56May 29, 2024Updated last year
- ☆11Apr 2, 2024Updated last year
- ☆13Apr 27, 2022Updated 3 years ago
- OSDI 2023 Welder, deeplearning compiler☆32Nov 24, 2023Updated 2 years ago
- ☆25Feb 20, 2024Updated 2 years ago
- ☆34May 23, 2025Updated 9 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆97Sep 19, 2025Updated 5 months ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Jun 21, 2019Updated 6 years ago
- play gemm with tvm☆92Jul 22, 2023Updated 2 years ago
- ☆18Apr 21, 2024Updated last year
- My study note for mlsys☆14Nov 4, 2024Updated last year
- ☆14Jan 28, 2026Updated last month
- ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch☆39Mar 27, 2025Updated 11 months ago
- ☆20Sep 28, 2024Updated last year
- ☆21Oct 21, 2024Updated last year
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆32Dec 21, 2024Updated last year
- ☆27Mar 24, 2025Updated 11 months ago
- Artifact of ASPLOS'23 paper entitled: GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model Inference☆19Mar 5, 2023Updated 2 years ago
- Github mirror of trition-lang/triton repo.☆146Updated this week
- Shared Middle-Layer for Triton Compilation☆329Dec 5, 2025Updated 2 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 8 months ago
- My Paper Reading Lists and Notes.☆21Feb 17, 2026Updated 2 weeks ago
- Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.☆84Updated this week
- ☆178May 7, 2025Updated 9 months ago
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆50Jul 23, 2024Updated last year
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆234Sep 24, 2023Updated 2 years ago
- A Easy-to-understand TensorOp Matmul Tutorial☆410Feb 11, 2026Updated 3 weeks ago
- incubator repo for CUDA-TileIR backend☆109Feb 14, 2026Updated 2 weeks ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆464May 30, 2025Updated 9 months ago
- Optimize tensor program fast with Felix, a gradient descent autotuner.☆32Apr 27, 2024Updated last year
- ☆24Mar 15, 2023Updated 2 years ago