Sea-Snell / JAX_llama
Inference code for LLaMA models in JAX
☆108Updated 3 months ago
Related projects: ⓘ
- Train very large language models in Jax.☆191Updated 10 months ago
- JAX implementation of the Llama 2 model☆205Updated 7 months ago
- LoRA for arbitrary JAX models and functions☆127Updated 6 months ago
- ☆64Updated 2 years ago
- some common Huggingface transformers in maximal update parametrization (µP)☆76Updated 2 years ago
- seqax = sequence modeling + JAX☆129Updated 2 months ago
- A simple library for scaling up JAX programs☆116Updated last month
- ☆129Updated last year
- ☆27Updated this week
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)☆184Updated 2 years ago
- JAX Synergistic Memory Inspector☆161Updated 2 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆110Updated 5 months ago
- ☆56Updated 2 years ago
- If it quacks like a tensor...☆48Updated 7 months ago
- Language models scale reliably with over-training and on downstream tasks☆91Updated 5 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆84Updated 4 months ago
- ☆172Updated last week
- ☆68Updated 2 months ago
- ☆160Updated last year
- Machine Learning eXperiment Utilities☆42Updated 3 months ago
- Understand and test language model architectures on synthetic tasks.☆156Updated 4 months ago
- ☆322Updated 5 months ago
- JAX bindings for Flash Attention v2☆75Updated 2 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆169Updated last month
- Implementation of Flash Attention in Jax☆188Updated 6 months ago
- ☆176Updated 2 months ago
- ☆20Updated last year
- [NeurIPS 2023] Learning Transformer Programs☆157Updated 3 months ago
- Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference…☆190Updated 3 months ago
- Exploring finetuning public checkpoints on filter 8K sequences on Pile☆115Updated last year