young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆28Updated 2 months ago
Alternatives and similar repositories for mintext:
Users that are interested in mintext are comparing it to the libraries listed below
- ☆51Updated 7 months ago
- A simple library for scaling up JAX programs☆129Updated 2 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆121Updated 9 months ago
- ☆75Updated 6 months ago
- ☆37Updated 9 months ago
- A set of Python scripts that makes your experience on TPU better☆44Updated 6 months ago
- JAX implementation of VQVAE/VQGAN autoencoders (+FSQ)☆24Updated 7 months ago
- ☆53Updated 11 months ago
- ☆46Updated 11 months ago
- ☆75Updated 6 months ago
- TPU pod commander is a package for managing and launching jobs on Google Cloud TPU pods.☆17Updated 6 months ago
- ☆31Updated last month
- Simple and efficient pytorch-native transformer training and inference (batched)☆66Updated 9 months ago
- If it quacks like a tensor...☆55Updated 2 months ago
- seqax = sequence modeling + JAX☆136Updated 6 months ago
- supporting pytorch FSDP for optimizers☆75Updated last month
- Learn online intrinsic rewards from LLM feedback☆33Updated last month
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆90Updated last month
- A MAD laboratory to improve AI architecture designs 🧪☆102Updated last month
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆66Updated 2 months ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆30Updated last month
- Machine Learning eXperiment Utilities☆45Updated 7 months ago
- Implementation of Direct Preference Optimization☆15Updated last year
- LoRA for arbitrary JAX models and functions☆135Updated 10 months ago
- A basic pure pytorch implementation of flash attention☆16Updated 2 months ago
- Triton Implementation of HyperAttention Algorithm☆46Updated last year
- Language models scale reliably with over-training and on downstream tasks☆96Updated 9 months ago
- JAX bindings for Flash Attention v2☆83Updated 6 months ago
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆45Updated last month