erfanzar / jax-flash-attn2Links

A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).

☆28

Alternatives and similar repositories for jax-flash-attn2

Users that are interested in jax-flash-attn2 are comparing it to the libraries listed below

Sorting:

young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated last month
young-geng / mlxu
Machine Learning eXperiment Utilities
☆46Updated 2 months ago
young-geng / scalax
A simple library for scaling up JAX programs
☆144Updated 11 months ago
berlino / seq_icl
☆53Updated last year
cloneofsimo / min-fsdp
☆91Updated last year
erfanzar / eformer
(EasyDel Former) is a utility library designed to simplify and enhance the development in JAX
☆28Updated this week
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆78Updated last year
proger / nanokitchen
Parallel Associative Scan for Language Models
☆17Updated last year
nshepperd / flash_attn_jax
JAX bindings for Flash Attention v2
☆97Updated last week
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆131Updated 10 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆168Updated 4 months ago
jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆186Updated last month
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆84Updated 11 months ago
sustcsonglin / mamba-triton
☆48Updated last year
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
Sea-Snell / JAX_llama
Inference code for LLaMA models in JAX
☆119Updated last year
yixiaoer / mistral-v0.2-jax
JAX implementation of the Mistral 7b v0.2 model
☆35Updated last year
srush / mamba-primer
☆38Updated last year
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆86Updated 3 years ago
davisyoshida / qax
If it quacks like a tensor...
☆59Updated 11 months ago
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆54Updated last month
MatX-inc / seqax
seqax = sequence modeling + JAX
☆168Updated 3 months ago
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆141Updated last year
Jaykef / Triton-nanoGPT
Custom triton kernels for training Karpathy's nanoGPT.
☆19Updated last year
Sea-Snell / JAXSeq
Train very large language models in Jax.
☆209Updated 2 years ago
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
shikaiqiu / compute-better-spent
☆58Updated last year
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆109Updated 5 months ago
sholtodouglas / scalingExperiments
☆62Updated 3 years ago