srush / LLM-Training-Puzzles
What would you do with 1000 H100s...
☆816Updated 8 months ago
Related projects: ⓘ
- Puzzles for learning Triton☆966Updated this week
- Puzzles for exploring transformers☆293Updated last year
- ☆325Updated 11 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆662Updated last month
- Building blocks for foundation models.☆345Updated 8 months ago
- 🤖 A PyTorch library of curated Transformer models and their composable components☆861Updated 5 months ago
- Minimalistic large language model 3D-parallelism training☆1,111Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆452Updated last week
- Annotated version of the Mamba paper☆445Updated 6 months ago
- TensorDict is a pytorch dedicated tensor container.☆807Updated this week
- Fast & Simple repository for pre-training and fine-tuning T5-style models☆957Updated 3 weeks ago
- Cramming the training of a (BERT-type) language model into limited compute.☆1,284Updated 3 months ago
- CUDA related news and material links☆1,079Updated 2 weeks ago
- ☆1,164Updated last week
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆492Updated this week
- Fast and flexible reference benchmarks☆435Updated last month
- A Data Streaming Library for Efficient Neural Network Training☆1,076Updated this week
- A repository for research on medium sized language models.☆469Updated 3 weeks ago
- MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvement…☆321Updated 2 weeks ago
- A library for mechanistic interpretability of GPT-style language models☆1,413Updated this week
- Tile primitives for speedy kernels☆1,489Updated this week
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆248Updated 10 months ago
- Helpful tools and examples for working with flex-attention☆341Updated last month
- maximal update parametrization (µP)☆1,334Updated 2 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆451Updated last month
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆530Updated 3 months ago
- Open weights language model from Google DeepMind, based on Griffin.☆592Updated 2 months ago
- An interactive exploration of Transformer programming.☆243Updated 10 months ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,099Updated 7 months ago
- Everything you want to know about Google Cloud TPU☆486Updated 2 months ago