nshepperd / flash_attn_jaxLinks

JAX bindings for Flash Attention v2

☆99

Alternatives and similar repositories for flash_attn_jax

Users that are interested in flash_attn_jax are comparing it to the libraries listed below

Sorting:

proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆192Updated last year
MatX-inc / seqax
seqax = sequence modeling + JAX
☆168Updated 4 months ago
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 4 months ago
epfml / dynamic-sparse-flash-attention
☆150Updated 2 years ago
cloneofsimo / min-fsdp
☆91Updated last year
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated 11 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated this week
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆212Updated 5 months ago
young-geng / scalax
A simple library for scaling up JAX programs
☆144Updated last month
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆243Updated 5 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆240Updated 2 months ago
open-lm-engine / accelerated-model-architectures
A bunch of kernels that might make stuff slower 😉
☆65Updated this week
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆253Updated 2 months ago
mgmalek / efficient_cross_entropy
☆121Updated last year
test-time-training / ttt-tk
☆41Updated last month
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆135Updated 11 months ago
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆216Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆174Updated 5 months ago
jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆204Updated 2 months ago
gpu-mode / ring-attention
ring-attention experiments
☆160Updated last year
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆74Updated 9 months ago
thinking-machines-lab / manifolds
Supporting code for the blog post on modular manifolds.
☆103Updated 2 months ago
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆226Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated last year
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆143Updated last year
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆91Updated 4 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆194Updated last year
HanGuo97 / log-linear-attention
☆256Updated 5 months ago
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆54Updated 2 months ago