nebius / kvax
A FlashAttention implementation for JAX with support for efficient document mask computation and context parallelism.
☆85Updated last month
Alternatives and similar repositories for kvax:
Users that are interested in kvax are comparing it to the libraries listed below
- ☆87Updated 2 weeks ago
- A simple library for scaling up JAX programs☆134Updated 5 months ago
- Einsum-like high-level array sharding API for JAX☆35Updated 8 months ago
- Minimal but scalable implementation of large language models in JAX☆34Updated 5 months ago
- JAX implementation of the Mistral 7b v0.2 model☆35Updated 8 months ago
- JAX bindings for Flash Attention v2☆89Updated 8 months ago
- supporting pytorch FSDP for optimizers☆80Updated 3 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆76Updated last week
- Machine Learning eXperiment Utilities☆46Updated 9 months ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆30Updated 3 months ago
- Experiment of using Tangent to autodiff triton☆78Updated last year
- Custom triton kernels for training Karpathy's nanoGPT.☆18Updated 5 months ago
- Implementation of Flash Attention in Jax☆207Updated last year
- Distributed pretraining of large language models (LLMs) on cloud TPU slices, with Jax and Equinox.☆24Updated 6 months ago
- jax-triton contains integrations between JAX and OpenAI Triton☆388Updated this week
- ☆215Updated 8 months ago
- Accelerated First Order Parallel Associative Scan☆180Updated 7 months ago
- JMP is a Mixed Precision library for JAX.☆193Updated 2 months ago
- seqax = sequence modeling + JAX☆151Updated 2 weeks ago
- Tensor Parallelism with JAX + Shard Map☆11Updated last year
- NanoGPT-speedrunning for the poor T4 enjoyers☆49Updated this week
- Jax like function transformation engine but micro, microjax☆30Updated 5 months ago
- ☆76Updated 8 months ago
- Named Tensors for Legible Deep Learning in JAX☆168Updated this week
- ☆147Updated this week
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆109Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆35Updated this week
- This is a port of Mistral-7B model in JAX☆32Updated 9 months ago
- ☆60Updated 3 years ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 8 months ago