nebius / kvaxLinks
A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.
☆157Updated 2 months ago
Alternatives and similar repositories for kvax
Users that are interested in kvax are comparing it to the libraries listed below
Sorting:
- Minimal yet performant LLM examples in pure JAX☆233Updated 2 weeks ago
- a Jax quantization library☆87Updated this week
- torchax is a PyTorch frontend for JAX. It gives JAX the ability to author JAX programs using familiar PyTorch syntax. It also provides JA…☆171Updated this week
- seqax = sequence modeling + JAX☆170Updated 6 months ago
- A simple library for scaling up JAX programs☆144Updated 2 months ago
- JAX-Toolbox☆381Updated this week
- FlashRNN - Fast RNN Kernels with I/O Awareness☆174Updated 3 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆68Updated this week
- jax-triton contains integrations between JAX and OpenAI Triton☆437Updated last month
- Accelerated First Order Parallel Associative Scan☆196Updated 3 weeks ago
- ☆289Updated last year
- 🧱 Modula software package☆322Updated 5 months ago
- MoE training for Me and You and maybe other people☆331Updated 3 weeks ago
- JAX bindings for Flash Attention v2☆103Updated last month
- Implementation of Diffusion Transformer (DiT) in JAX☆305Updated last year
- Attention Kernels for Symmetric Power Transformers☆128Updated 4 months ago
- Dion optimizer algorithm☆420Updated 2 weeks ago
- Efficient optimizers☆280Updated last month
- Distributed pretraining of large language models (LLMs) on cloud TPU slices, with Jax and Equinox.☆24Updated last year
- An implementation of PSGD Kron second-order optimizer for PyTorch☆98Updated 6 months ago
- Custom triton kernels for training Karpathy's nanoGPT.☆19Updated last year
- A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.☆113Updated last month
- Minimal, lightweight JAX implementations of popular models.☆180Updated this week
- ☆134Updated last month
- ☆300Updated this week
- Cost aware hyperparameter tuning algorithm☆177Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆154Updated 2 years ago
- Minimal but scalable implementation of large language models in JAX☆35Updated 2 months ago
- A set of Python scripts that makes your experience on TPU better☆55Updated 4 months ago
- ☆92Updated last year