nshepperd / flash_attn_jax
JAX bindings for Flash Attention v2
☆83Updated 6 months ago
Alternatives and similar repositories for flash_attn_jax:
Users that are interested in flash_attn_jax are comparing it to the libraries listed below
- A simple library for scaling up JAX programs☆129Updated 2 months ago
- LoRA for arbitrary JAX models and functions☆135Updated 10 months ago
- Accelerated First Order Parallel Associative Scan☆169Updated 4 months ago
- ☆135Updated last year
- Experiment of using Tangent to autodiff triton☆74Updated 11 months ago
- A library for unit scaling in PyTorch☆118Updated last month
- seqax = sequence modeling + JAX☆136Updated 6 months ago
- ☆75Updated 6 months ago
- ☆46Updated 11 months ago
- Implementation of Flash Attention in Jax☆204Updated 10 months ago
- A set of Python scripts that makes your experience on TPU better☆44Updated 6 months ago
- Minimal but scalable implementation of large language models in JAX☆28Updated 2 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆67Updated 7 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆192Updated last month
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- Simple and efficient pytorch-native transformer training and inference (batched)☆66Updated 9 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆69Updated last month
- ring-attention experiments☆116Updated 3 months ago
- If it quacks like a tensor...☆55Updated 2 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆102Updated last month
- Fast and memory-efficient exact attention☆52Updated last month
- ☆83Updated 7 months ago
- ☆53Updated 11 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆219Updated last month
- Inference code for LLaMA models in JAX☆114Updated 7 months ago
- ☆51Updated 7 months ago
- ☆85Updated 10 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆55Updated last month
- ☆215Updated 8 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆43Updated 6 months ago