thuml / learn_torch.compile
torch.compile artifacts for common deep learning models, can be used as a learning resource for torch.compile
☆16Updated last year
Alternatives and similar repositories for learn_torch.compile:
Users that are interested in learn_torch.compile are comparing it to the libraries listed below
- TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.☆55Updated this week
- An external memory allocator example for PyTorch.☆14Updated 3 years ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- Quantized Attention on GPU☆34Updated 2 months ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆51Updated last week
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 11 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆61Updated 3 weeks ago
- Benchmark tests supporting the TiledCUDA library.☆15Updated 2 months ago
- GPTQ inference TVM kernel☆38Updated 9 months ago
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆175Updated 3 weeks ago
- ☆11Updated 3 years ago
- Debug print operator for cudagraph debugging☆10Updated 6 months ago
- ☆59Updated 3 months ago
- This is a demo how to write a high performance convolution run on apple silicon☆52Updated 3 years ago
- ☆19Updated 4 months ago
- study of Ampere' Sparse Matmul☆16Updated 4 years ago
- study of cutlass☆21Updated 3 months ago
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆134Updated last year
- ☆21Updated last week
- TensorRT LLM Benchmark Configuration☆13Updated 6 months ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆19Updated last week
- ☆33Updated last month
- Implement Flash Attention using Cute.☆69Updated 2 months ago
- ☆27Updated 10 months ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆25Updated 2 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆104Updated 5 months ago
- Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.☆29Updated 3 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated 2 weeks ago
- ☆48Updated 11 months ago