nebius / kvaxLinks
A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.
☆147Updated 6 months ago
Alternatives and similar repositories for kvax
Users that are interested in kvax are comparing it to the libraries listed below
Sorting:
- Minimal yet performant LLM examples in pure JAX☆187Updated last month
 - ☆114Updated this week
 - ☆283Updated last year
 - seqax = sequence modeling + JAX☆168Updated 3 months ago
 - A simple library for scaling up JAX programs☆144Updated last year
 - jax-triton contains integrations between JAX and OpenAI Triton☆429Updated 2 weeks ago
 - JAX-Toolbox☆356Updated this week
 - Minimal but scalable implementation of large language models in JAX☆35Updated 2 months ago
 - Dion optimizer algorithm☆374Updated last month
 - Write a fast kernel and run it on Discord. See how you compare against the best!☆58Updated 3 weeks ago
 - 🧱 Modula software package☆299Updated 2 months ago
 - JAX bindings for Flash Attention v2☆97Updated last week
 - A set of Python scripts that makes your experience on TPU better☆54Updated last month
 - ☆68Updated 11 months ago
 - Experiment of using Tangent to autodiff triton☆80Updated last year
 - Accelerated First Order Parallel Associative Scan☆189Updated last year
 - Pytorch-like dataloaders for JAX.☆93Updated 5 months ago
 - JAX implementation of the Mistral 7b v0.2 model☆34Updated last year
 - Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆441Updated last week
 - An implementation of PSGD Kron second-order optimizer for PyTorch☆96Updated 3 months ago
 - Attention Kernels for Symmetric Power Transformers☆121Updated last month
 - FlashRNN - Fast RNN Kernels with I/O Awareness☆103Updated 2 weeks ago
 - ☆190Updated last week
 - A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.☆104Updated last month
 - Distributed pretraining of large language models (LLMs) on cloud TPU slices, with Jax and Equinox.☆24Updated last year
 - ☆89Updated last year
 - Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆147Updated 2 years ago
 - Custom triton kernels for training Karpathy's nanoGPT.☆19Updated last year
 - A library for unit scaling in PyTorch☆132Updated 3 months ago
 - train with kittens!☆63Updated last year