epfml / dynamic-sparse-flash-attention
☆132Updated last year
Related projects ⓘ
Alternatives and complementary repositories for dynamic-sparse-flash-attention
- Triton-based implementation of Sparse Mixture of Experts.☆185Updated last month
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆214Updated this week
- Understand and test language model architectures on synthetic tasks.☆163Updated 6 months ago
- ☆77Updated 5 months ago
- ☆74Updated 11 months ago
- Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers☆195Updated 3 months ago
- JAX bindings for Flash Attention v2☆80Updated 4 months ago
- Some preliminary explorations of Mamba's context scaling.☆191Updated 9 months ago
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆86Updated 9 months ago
- This repository contains the experimental PyTorch native float8 training UX☆212Updated 3 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 6 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.☆78Updated 3 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆61Updated 7 months ago
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆110Updated 8 months ago
- some common Huggingface transformers in maximal update parametrization (µP)☆77Updated 2 years ago
- ☆88Updated 2 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆104Updated 2 months ago
- Explorations into some recent techniques surrounding speculative decoding☆212Updated last year
- Accelerated First Order Parallel Associative Scan☆164Updated 3 months ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆54Updated 3 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆194Updated this week
- ☆96Updated 2 months ago
- A toolkit for scaling law research ⚖☆43Updated 8 months ago
- A library for unit scaling in PyTorch☆105Updated 2 weeks ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆149Updated 4 months ago
- ☆98Updated 8 months ago
- Randomized Positional Encodings Boost Length Generalization of Transformers☆78Updated 8 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆477Updated 3 weeks ago
- Inference code for LLaMA models in JAX☆113Updated 6 months ago
- ☆50Updated 6 months ago