CLAIRE-Labo / flash_attentionLinks

A basic pure pytorch implementation of flash attention

☆16

Alternatives and similar repositories for flash_attention

Users that are interested in flash_attention are comparing it to the libraries listed below

Sorting:

cloneofsimo / min-fsdp
☆91Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
fal-ai-community / nano-mdm
Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun
☆56Updated 7 months ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆84Updated last year
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆63Updated 3 weeks ago
edwardmilsom / function-space-learning-rates-paper
Code for the paper "Function-Space Learning Rates"
☆23Updated 4 months ago
jiasenlu / LL3M
LL3M: Large Language and Multi-Modal Model in Jax
☆74Updated last year
RobertCsordas / moeut
☆86Updated last year
dvruette / barrel-rec-pytorch
☆53Updated last year
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago
test-time-training / ttt-tk
☆41Updated 2 weeks ago
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆78Updated last year
ChenWu98 / algorithmic-creativity
[ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
☆72Updated 5 months ago
srush / mamba-primer
☆38Updated last year
main-horse / hnet-old
H-Net Dynamic Hierarchical Architecture
☆80Updated last month
tyler-romero / microR1
Simple repository for training small reasoning models
☆44Updated 8 months ago
google-deepmind / spectral_ssm
☆34Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆170Updated 4 months ago
wmn-231314 / diffusion-data-constraint
Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…
☆102Updated this week
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆99Updated last year
zaydzuhri / softpick-attention
Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"
☆85Updated last month
idiap / sigma-gpt
σ-GPT: A New Approach to Autoregressive Models
☆68Updated last year
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆109Updated 3 weeks ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
dvruette / gidd
Code accompanying the paper "Generalized Interpolating Discrete Diffusion"
☆106Updated 4 months ago
epfml / DenseFormer
☆81Updated last year
lucidrains / self-reasoning-tokens-pytorch
Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto
☆57Updated last year