rwitten / HighPerfLLMs2024
☆446Updated 9 months ago
Alternatives and similar repositories for HighPerfLLMs2024:
Users that are interested in HighPerfLLMs2024 are comparing it to the libraries listed below
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆569Updated this week
- Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…☆491Updated last week
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆258Updated last week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆536Updated last week
- ☆217Updated 9 months ago
- What would you do with 1000 H100s...☆1,043Updated last year
- ☆430Updated 6 months ago
- Best practices & guides on how to write distributed pytorch training code☆408Updated 2 months ago
- seqax = sequence modeling + JAX☆155Updated last month
- Puzzles for exploring transformers☆344Updated 2 years ago
- Building blocks for foundation models.☆487Updated last year
- Puzzles for learning Triton☆1,603Updated 5 months ago
- jax-triton contains integrations between JAX and OpenAI Triton☆393Updated this week
- Fast bare-bones BPE for modern tokenizer training☆154Updated last month
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆788Updated last week
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆244Updated this week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆322Updated this week
- Implementation of Diffusion Transformer (DiT) in JAX☆272Updated 10 months ago
- ☆202Updated last week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆130Updated last year
- Open weights language model from Google DeepMind, based on Griffin.☆636Updated 2 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆180Updated last week
- ☆155Updated last year
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,445Updated 2 months ago
- ☆181Updated 2 months ago
- ☆297Updated this week
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆288Updated last week
- Cataloging released Triton kernels.☆220Updated 3 months ago
- A puzzle to learn about prompting☆127Updated last year
- An extension of the nanoGPT repository for training small MOE models.☆138Updated last month