JimyMa / FuncTsLinks
[DAC2024] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning
☆15Updated last year
Alternatives and similar repositories for FuncTs
Users that are interested in FuncTs are comparing it to the libraries listed below
Sorting:
- ☆45Updated last year
- Summary of some awesome work for optimizing LLM inference☆134Updated last week
- ☆131Updated 2 weeks ago
- Open ABI and FFI for Machine Learning Systems☆152Updated last week
- Tile-based language built for AI computation across all scales☆74Updated this week
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆47Updated last month
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆66Updated 6 months ago
- DeepSeek-V3/R1 inference performance simulator☆170Updated 7 months ago
- Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM☆63Updated 2 months ago
- Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.☆76Updated last week
- High performance Transformer implementation in C++.☆140Updated 9 months ago
- A lightweight design for computation-communication overlap.☆183Updated last month
- ☆124Updated 11 months ago
- A Easy-to-understand TensorOp Matmul Tutorial☆389Updated last month
- Building the Virtuous Cycle for AI-driven LLM Systems☆82Updated this week
- A torch compile backend for multi-targets☆40Updated this week
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆285Updated 5 months ago
- ☆16Updated 8 months ago
- InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)☆157Updated last year
- WaferLLM: Large Language Model Inference at Wafer Scale☆65Updated last week
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆55Updated last year
- nnScaler: Compiling DNN models for Parallel Training☆118Updated last month
- LLM Inference analyzer for different hardware platforms☆94Updated 4 months ago
- Development repository for the Triton-Linalg conversion☆204Updated 9 months ago
- tutorials about polyhedral compilation.☆55Updated 2 weeks ago
- ☆154Updated 6 months ago
- Examples of CUDA implementations by Cutlass CuTe☆246Updated 4 months ago
- ☆199Updated 2 weeks ago
- From Minimal GEMM to Everything☆67Updated last month
- A benchmark suited especially for deep learning operators☆42Updated 2 years ago