nebius / kvaxLinks

A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.

☆144

Alternatives and similar repositories for kvax

Users that are interested in kvax are comparing it to the libraries listed below

Sorting:

jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆181Updated 2 weeks ago
young-geng / scalax
A simple library for scaling up JAX programs
☆143Updated 11 months ago
MatX-inc / seqax
seqax = sequence modeling + JAX
☆167Updated 2 months ago
jax-ml / jax-triton
jax-triton contains integrations between JAX and OpenAI Triton
☆426Updated this week
NVIDIA / JAX-Toolbox
JAX-Toolbox
☆348Updated this week
modula-systems / modula
🧱 Modula software package
☆282Updated last month
google-deepmind / nanodo
☆282Updated last year
microsoft / dion
Dion optimizer algorithm
☆361Updated last week
nshepperd / flash_attn_jax
JAX bindings for Flash Attention v2
☆95Updated last month
divyamakkar0 / JAXformer
A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.
☆100Updated 2 weeks ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆58Updated 2 weeks ago
kvfrans / jax-diffusion-transformer
Implementation of Diffusion Transformer (DiT) in JAX
☆292Updated last year
apple / ml-ademamix
☆67Updated 10 months ago
AllanYangZhou / midGPT
Distributed pretraining of large language models (LLMs) on cloud TPU slices, with Jax and Equinox.
☆24Updated last year
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆189Updated last year
jax-ml / bonsai
Minimal, lightweight JAX implementations of popular models.
☆110Updated this week
evanatyourservice / kron_torch
An implementation of PSGD Kron second-order optimizer for PyTorch
☆95Updated 2 months ago
NX-AI / flashrnn
FlashRNN - Fast RNN Kernels with I/O Awareness
☆98Updated 4 months ago
m-a-n-i-f-e-s-t / power-attention
Attention Kernels for Symmetric Power Transformers
☆120Updated 2 weeks ago
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆192Updated 10 months ago
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated last month
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago
HenryNdubuaku / nanodl
A Jax-based library for building transformers, includes implementations of GPT, Gemma, LlaMa, Mixtral, Whisper, SWin, ViT and more.
☆293Updated last year
yixiaoer / einshard
Einsum-like high-level array sharding API for JAX
☆35Updated last year
marin-community / haliax
Named Tensors for Legible Deep Learning in JAX
☆207Updated last week
HomebrewML / HeavyBall
Efficient optimizers
☆269Updated this week
google-research / kauldron
Modular, scalable library to train ML models
☆166Updated this week
google-deepmind / jmp
JMP is a Mixed Precision library for JAX.
☆207Updated 8 months ago
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆130Updated 3 months ago