rwitten / HighPerfLLMs2024Links

☆546

Alternatives and similar repositories for HighPerfLLMs2024

Users that are interested in HighPerfLLMs2024 are comparing it to the libraries listed below

Sorting:

jax-ml / scaling-book
Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs
☆710Updated this week
marin-community / levanter
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
☆685Updated last week
HazyResearch / aisys-building-blocks
Building blocks for foundation models.
☆581Updated last year
srush / LLM-Training-Puzzles
What would you do with 1000 H100s...
☆1,132Updated last year
EleutherAI / cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
☆829Updated 4 months ago
google-deepmind / nanodo
☆285Updated last year
srush / Transformer-Puzzles
Puzzles for exploring transformers
☆380Updated 2 years ago
google / paxml
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…
☆540Updated 2 weeks ago
LambdaLabsML / distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
☆543Updated last month
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆584Updated 3 months ago
huggingface / picotron_tutorial
☆224Updated last week
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆151Updated 2 years ago
MatX-inc / seqax
seqax = sequence modeling + JAX
☆168Updated 4 months ago
srush / Autodiff-Puzzles
☆460Updated last year
meta-pytorch / torchft
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆454Updated 3 weeks ago
jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆204Updated 2 months ago
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆313Updated last month
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆196Updated 6 months ago
marin-community / marin
Open-source framework for the research and development of foundation models.
☆640Updated this week
google / aqt
☆337Updated last week
AI-Hypercomputer / JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…
☆392Updated 5 months ago
gpu-mode / profiling-cuda-in-torch
☆177Updated last year
jax-ml / jax-triton
jax-triton contains integrations between JAX and OpenAI Triton
☆436Updated last week
srush / annotated-mamba
Annotated version of the Mamba paper
☆491Updated last year
Quentin-Anthony / torch-profiling-tutorial
☆532Updated 3 months ago
erfanzar / EasyDeL
Accelerate, Optimize performance with streamlined training and serving options with JAX.
☆325Updated this week
rkinas / triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆437Updated 8 months ago
microsoft / dion
Dion optimizer algorithm
☆395Updated 2 weeks ago
mlcommons / algorithmic-efficiency
MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvement…
☆401Updated this week
Quentin-Anthony / nanoMPI
Simple MPI implementation for prototyping or learning
☆291Updated 3 months ago