TileLang / tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
☆14Updated this week
Related projects ⓘ
Alternatives and complementary repositories for tvm
- GPTQ inference TVM kernel☆35Updated 6 months ago
- TensorRT LLM Benchmark Configuration☆11Updated 3 months ago
- Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.☆23Updated last week
- Quantized Attention on GPU☆29Updated last week
- Odysseus: Playground of LLM Sequence Parallelism☆55Updated 4 months ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆20Updated this week
- Standalone Flash Attention v2 kernel without libtorch dependency☆98Updated 2 months ago
- ☆55Updated 5 months ago
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆22Updated this week
- ☆46Updated last month
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆20Updated 4 months ago
- ☆79Updated 2 months ago
- Fast and memory-efficient exact attention☆28Updated 2 weeks ago
- [WIP] Context parallel attention that works with torch.compile☆20Updated last week
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- Summary of system papers/frameworks/codes/tools on training or serving large model☆56Updated 10 months ago
- ☆29Updated 5 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆53Updated last week
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆34Updated 8 months ago
- ☆27Updated this week
- ☆24Updated last week
- ☆42Updated 11 months ago
- MLPerf™ logging library☆30Updated last week
- ☆11Updated last year
- 分层解耦的深度学习推理引擎☆60Updated 2 months ago
- ☆18Updated last month
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆52Updated 3 months ago
- ☆40Updated last week
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆4Updated last week