JimyMa / FuncTsLinks
[DAC2024] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning
☆15Updated last year
Alternatives and similar repositories for FuncTs
Users that are interested in FuncTs are comparing it to the libraries listed below
Sorting:
- ☆123Updated this week
- ☆43Updated last year
- Summary of some awesome work for optimizing LLM inference☆110Updated 3 months ago
- ☆30Updated last year
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆264Updated 3 months ago
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆18Updated last week
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆55Updated last year
- A benchmark suited especially for deep learning operators☆42Updated 2 years ago
- TVM FFI☆55Updated this week
- High performance Transformer implementation in C++.☆134Updated 8 months ago
- ☆15Updated 6 months ago
- Tile-based language built for AI computation across all scales☆59Updated last week
- DeepSeek-V3/R1 inference performance simulator☆167Updated 6 months ago
- WaferLLM: Large Language Model Inference at Wafer Scale☆55Updated last week
- Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM☆63Updated last month
- A lightweight design for computation-communication overlap.☆171Updated last week
- ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch☆38Updated 5 months ago
- Examples of CUDA implementations by Cutlass CuTe☆233Updated 2 months ago
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆57Updated 5 months ago
- tutorials about polyhedral compilation.☆54Updated this week
- ☆85Updated 5 months ago
- ☆106Updated 4 months ago
- Triton multi-level runner, include IR/PTX/cubin.☆54Updated this week
- ☆19Updated last year
- A Easy-to-understand TensorOp Matmul Tutorial☆378Updated last year
- Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators☆115Updated 2 years ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆128Updated last week
- nnScaler: Compiling DNN models for Parallel Training☆118Updated 3 weeks ago
- ☆140Updated 4 months ago
- ☆13Updated last year