srush / LLM-Training-Puzzles
What would you do with 1000 H100s...
☆903Updated 10 months ago
Related projects ⓘ
Alternatives and complementary repositories for LLM-Training-Puzzles
- Puzzles for exploring transformers☆325Updated last year
- ☆391Updated last month
- Puzzles for learning Triton☆1,135Updated this week
- Annotated version of the Mamba paper☆457Updated 8 months ago
- A bibliography and survey of the papers surrounding o1☆754Updated this week
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆516Updated this week
- Building blocks for foundation models.☆394Updated 10 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆715Updated last month
- For optimization algorithm research and development.☆449Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆483Updated 3 weeks ago
- A puzzle to learn about prompting☆121Updated last year
- A repository for research on medium sized language models.☆479Updated this week
- TensorDict is a pytorch dedicated tensor container.☆840Updated this week
- ☆224Updated 4 months ago
- Helpful tools and examples for working with flex-attention☆469Updated 3 weeks ago
- GPU programming related news and material links☆1,237Updated last month
- Minimalistic large language model 3D-parallelism training☆1,260Updated this week
- Fast & Simple repository for pre-training and fine-tuning T5-style models☆970Updated 3 months ago
- Pipeline Parallelism for PyTorch☆726Updated 2 months ago
- MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvement…☆333Updated 3 weeks ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆252Updated last year
- Tile primitives for speedy kernels☆1,658Updated this week
- 🤖 A PyTorch library of curated Transformer models and their composable components☆866Updated 7 months ago
- Tools for understanding how transformer predictions are built layer-by-layer☆430Updated 5 months ago
- Fast bare-bones BPE for modern tokenizer training☆142Updated 3 weeks ago
- Best practices & guides on how to write distributed pytorch training code☆286Updated 2 weeks ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆476Updated 3 weeks ago
- Transformers with Arbitrarily Large Context☆641Updated 3 months ago
- Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…☆457Updated last week