zhaiyi000 / tlmView external linksLinks
☆48Jul 13, 2024Updated last year
Alternatives and similar repositories for tlm
Users that are interested in tlm are comparing it to the libraries listed below
Sorting:
- Optimize tensor program fast with Felix, a gradient descent autotuner.☆30Apr 27, 2024Updated last year
- ☆41Apr 25, 2024Updated last year
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆56May 29, 2024Updated last year
- ☆17Jan 24, 2024Updated 2 years ago
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆142Mar 31, 2023Updated 2 years ago
- ☆95Nov 4, 2022Updated 3 years ago
- Horizontal Fusion☆24Jan 7, 2022Updated 4 years ago
- ☆13Dec 31, 2023Updated 2 years ago
- ☆12Jan 7, 2025Updated last year
- CodeBERT based mutation testing tool.☆13Nov 10, 2025Updated 3 months ago
- Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators☆121Oct 26, 2022Updated 3 years ago
- Official implementation for the paper Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapp…☆14Nov 17, 2025Updated 2 months ago
- ☆32Jul 17, 2024Updated last year
- Tutorials of Extending and importing TVM with CMAKE Include dependency.☆16Oct 11, 2024Updated last year
- A resilient distributed training framework☆96Apr 11, 2024Updated last year
- Tencent Distribution of TVM☆15Apr 7, 2023Updated 2 years ago
- DietCode Code Release☆65Jul 21, 2022Updated 3 years ago
- [MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration☆199Apr 27, 2022Updated 3 years ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆20Feb 23, 2024Updated last year
- ☆38Jun 27, 2025Updated 7 months ago
- Compiler for Dynamic Neural Networks☆45Nov 13, 2023Updated 2 years ago
- Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training☆24Mar 1, 2024Updated last year
- Triton to TVM transpiler.☆22Oct 14, 2024Updated last year
- Implement Flash Attention using Cute.☆100Dec 17, 2024Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 7 months ago
- Torch Frontend for IREE☆25Dec 21, 2023Updated 2 years ago
- Automatic Schedule Exploration and Optimization Framework for Tensor Computations☆182Apr 25, 2022Updated 3 years ago
- Xtext project to parse CoreDSL files☆24Oct 17, 2025Updated 3 months ago
- play gemm with tvm☆91Jul 22, 2023Updated 2 years ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆234Sep 24, 2023Updated 2 years ago
- ☆172Updated this week
- This is the repo for an incremental pointer analysis for Java programs. This repo has been adopted by WALA☆25Feb 13, 2023Updated 3 years ago
- ☆145Jan 30, 2025Updated last year
- REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…☆104Dec 24, 2022Updated 3 years ago
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆192Jan 28, 2025Updated last year
- ☆288Feb 4, 2026Updated last week
- A pattern-based algorithmic autotuner for graph processing on GPUs.☆32Jun 25, 2025Updated 7 months ago
- A list of awesome compiler projects and papers for tensor computation and deep learning.☆2,731Oct 19, 2024Updated last year
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆68May 1, 2024Updated last year