nebius / kvaxLinks
A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.
☆158Updated 3 months ago
Alternatives and similar repositories for kvax
Users that are interested in kvax are comparing it to the libraries listed below
Sorting:
- Minimal yet performant LLM examples in pure JAX☆240Updated 3 weeks ago
- JAX-Toolbox☆382Updated this week
- jax-triton contains integrations between JAX and OpenAI Triton☆437Updated 2 months ago
- torchax is a PyTorch frontend for JAX. It gives JAX the ability to author JAX programs using familiar PyTorch syntax. It also provides JA…☆175Updated last week
- ☆291Updated last year
- A simple library for scaling up JAX programs☆145Updated 3 months ago
- seqax = sequence modeling + JAX☆170Updated 6 months ago
- 🧱 Modula software package☆322Updated 5 months ago
- a Jax quantization library☆90Updated last week
- Minimal, lightweight JAX implementations of popular models.☆191Updated this week
- Tokamax: A GPU and TPU kernel library.☆170Updated this week
- Dion optimizer algorithm☆431Updated 3 weeks ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆71Updated this week
- Accelerated First Order Parallel Associative Scan☆196Updated last month
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆131Updated this week
- Implementation of Diffusion Transformer (DiT) in JAX☆306Updated last year
- MoE training for Me and You and maybe other people☆353Updated this week
- Implementation of Flash Attention in Jax☆225Updated last year
- Minimal but scalable implementation of large language models in JAX☆35Updated 2 months ago
- ☆307Updated this week
- A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.☆115Updated last month
- Efficient optimizers☆281Updated last month
- A Jax-based library for building transformers, includes implementations of GPT, Gemma, LlaMa, Mixtral, Whisper, SWin, ViT and more.☆300Updated last year
- JAX bindings for Flash Attention v2☆103Updated last week
- FlashRNN - Fast RNN Kernels with I/O Awareness☆174Updated 3 months ago
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆693Updated 2 weeks ago
- ☆27Updated last year
- Named Tensors for Legible Deep Learning in JAX☆218Updated 3 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Updated 9 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆98Updated 6 months ago