IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆20Updated this week
Related projects ⓘ
Alternatives and complementary repositories for triton-dejavu
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆19Updated this week
- ☆45Updated 2 weeks ago
- extensible collectives library in triton☆72Updated last month
- Simple and fast low-bit matmul kernels in CUDA / Triton☆145Updated this week
- Boosting 4-bit inference kernels with 2:4 Sparsity☆51Updated 2 months ago
- ☆47Updated 2 months ago
- TensorRT LLM Benchmark Configuration☆11Updated 3 months ago
- ☆55Updated 5 months ago
- ☆88Updated 2 months ago
- Collection of kernels written in Triton language☆68Updated 3 weeks ago
- Applied AI experiments and examples for PyTorch☆166Updated 3 weeks ago
- Memory Optimizations for Deep Learning (ICML 2023)☆60Updated 8 months ago
- Efficient, Flexible and Portable Structured Generation☆53Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆50Updated this week
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 10 months ago
- Cataloging released Triton kernels.☆134Updated 2 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆57Updated 5 months ago
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- GPTQ inference TVM kernel☆36Updated 6 months ago
- ☆12Updated last month
- PyTorch bindings for CUTLASS grouped GEMM.☆53Updated 3 weeks ago
- Experiment of using Tangent to autodiff triton☆72Updated 9 months ago
- ☆22Updated 10 months ago
- FlexAttention w/ FlashAttention3 Support☆27Updated last month
- Example of applying CUDA graphs to LLaMA-v2☆10Updated last year
- ring-attention experiments☆97Updated last month
- Prototype routines for GPU quantization written using PyTorch.☆19Updated last week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆35Updated 6 months ago
- LLM KV cache compression made easy☆64Updated last week
- ☆32Updated this week