CLAIRE-Labo / flash_attentionLinks
A basic pure pytorch implementation of flash attention
☆16Updated 8 months ago
Alternatives and similar repositories for flash_attention
Users that are interested in flash_attention are comparing it to the libraries listed below
Sorting:
- ☆80Updated last year
- Triton Implementation of HyperAttention Algorithm☆48Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆129Updated last year
- Code for the paper "Function-Space Learning Rates"☆20Updated last month
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆75Updated 8 months ago
- ☆37Updated last year
- supporting pytorch FSDP for optimizers☆82Updated 7 months ago
- The evaluation framework for training-free sparse attention in LLMs☆83Updated 3 weeks ago
- ☆53Updated last year
- ☆82Updated 10 months ago
- LL3M: Large Language and Multi-Modal Model in Jax☆72Updated last year
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"☆94Updated last month
- Explorations into the recently proposed Taylor Series Linear Attention☆99Updated 10 months ago
- ☆81Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆147Updated 2 weeks ago
- ☆37Updated 3 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆55Updated 4 months ago
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆56Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆78Updated last month
- Language models scale reliably with over-training and on downstream tasks☆97Updated last year
- ☆55Updated 7 months ago
- Machine Learning eXperiment Utilities☆46Updated 2 weeks ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆84Updated last year
- Using FlexAttention to compute attention with different masking patterns☆44Updated 9 months ago
- WIP☆93Updated 11 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Updated last year
- Implementations of attention with the softpick function, naive and FlashAttention-2☆80Updated 2 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆77Updated last year
- ☆31Updated 7 months ago
- ☆87Updated last year