hkproj / triton-flash-attention
☆110Updated 3 weeks ago
Alternatives and similar repositories for triton-flash-attention:
Users that are interested in triton-flash-attention are comparing it to the libraries listed below
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆171Updated this week
- ☆140Updated 11 months ago
- ring-attention experiments☆119Updated 3 months ago
- Cataloging released Triton kernels.☆157Updated 2 weeks ago
- Efficient LLM Inference over Long Sequences☆349Updated last month
- LLM KV cache compression made easy☆356Updated this week
- ☆121Updated this week
- ☆171Updated last week
- Triton implementation of GPT/LLAMA☆16Updated 5 months ago
- LoRA and DoRA from Scratch Implementations☆195Updated 10 months ago
- Applied AI experiments and examples for PyTorch☆215Updated last week
- Fast low-bit matmul kernels in Triton☆199Updated last week
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆261Updated 3 weeks ago
- Notes on quantization in neural networks☆66Updated last year
- Collection of kernels written in Triton language☆91Updated 3 months ago
- Prune transformer layers☆67Updated 8 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆511Updated this week
- Distributed training (multi-node) of a Transformer model☆50Updated 9 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆221Updated last week
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆216Updated this week
- This repository contains the experimental PyTorch native float8 training UX☆219Updated 5 months ago
- Mixed precision training from scratch with Tensors and CUDA☆21Updated 8 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆186Updated last week
- Normalized Transformer (nGPT)☆146Updated 2 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆50Updated 9 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆113Updated last month
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆190Updated 6 months ago
- Complete implementation of Llama2 with/without KV cache & inference 🚀☆47Updated 8 months ago