zinccat / flaxattentionLinks

☆25

Alternatives and similar repositories for flaxattention

Users that are interested in flaxattention are comparing it to the libraries listed below

Sorting:

nshepperd / flash_attn_jax
JAX bindings for Flash Attention v2
☆97Updated last week
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆54Updated last month
modula-systems / modula
🧱 Modula software package
☆299Updated 2 months ago
MatX-inc / seqax
seqax = sequence modeling + JAX
☆168Updated 3 months ago
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 3 months ago
jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆187Updated last month
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆189Updated last year
google-deepmind / nanodo
☆283Updated last year
lucaslingle / mu_transformer
Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.
☆32Updated 4 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆170Updated 4 months ago
nebius / kvax
A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.
☆147Updated 6 months ago
cloneofsimo / min-fsdp
☆91Updated last year
yixiaoer / mistral-v0.2-jax
JAX implementation of the Mistral 7b v0.2 model
☆34Updated last year
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆234Updated last month
davisyoshida / qax
If it quacks like a tensor...
☆59Updated 11 months ago
mgmalek / efficient_cross_entropy
☆121Updated last year
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆216Updated last year
erfanzar / EasyDeL
Accelerate, Optimize performance with streamlined training and serving options with JAX.
☆317Updated last week
fattorib / ZeRO-transformer
Two implementations of ZeRO-1 optimizer sharding in JAX
☆14Updated 2 years ago
jenkspt / gpt-jax
Jax/Flax rewrite of Karpathy's nanoGPT
☆62Updated 2 years ago
erfanzar / jax-flash-attn2
A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/…
☆29Updated 7 months ago
ayaka14732 / jax-smi
JAX Synergistic Memory Inspector
☆179Updated last year
glassroom / heinsen_sequence
Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)
☆95Updated 10 months ago
HomebrewML / HeavyBall
Efficient optimizers
☆276Updated 2 weeks ago
young-geng / scalax
A simple library for scaling up JAX programs
☆144Updated last year
lucidrains / flash-attention-jax
Implementation of Flash Attention in Jax
☆219Updated last year
Jaykef / Triton-nanoGPT
Custom triton kernels for training Karpathy's nanoGPT.
☆19Updated last year
google / aqt
☆335Updated last month