moritztng / grayskull-attention
Attention in SRAM on Tenstorrent Grayskull
☆22Updated 2 months ago
Related projects: ⓘ
- LLM training in simple, raw C/CUDA☆79Updated 4 months ago
- Tenstorrent MLIR compiler☆52Updated this week
- ☆124Updated last week
- Learning about CUDA by writing PTX code.☆28Updated 6 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆73Updated 3 weeks ago
- Simple and fast low-bit matmul kernels in CUDA☆48Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆30Updated 4 months ago
- ☆27Updated 3 weeks ago
- Cataloging released Triton kernels.☆111Updated 3 weeks ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆156Updated this week
- Repository of model demos using TT-Buda☆54Updated this week
- IREE's PyTorch Frontend, based on Torch Dynamo.☆43Updated this week
- ring-attention experiments☆89Updated 5 months ago
- Collection of kernels written in Triton language☆48Updated 2 weeks ago
- Gpu benchmark☆35Updated 2 weeks ago
- ☆14Updated 4 months ago
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆103Updated 5 months ago
- llama INT4 cuda inference with AWQ☆46Updated 2 months ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆87Updated 3 months ago
- Code for Palu: Compressing KV-Cache with Low-Rank Projection☆39Updated this week
- ☆82Updated 6 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆81Updated 2 months ago
- Applied AI experiments and examples for PyTorch☆123Updated last month
- ☆30Updated 3 months ago
- Example of applying CUDA graphs to LLaMA-v2☆10Updated last year
- Personal solutions to the Triton Puzzles☆11Updated 2 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆94Updated 2 weeks ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆93Updated last week
- GPTQ inference TVM kernel☆35Updated 4 months ago
- ☆22Updated last week