cloneofsimo/min-max-gpt

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cloneofsimo/min-max-gpt)

cloneofsimo / min-max-gpt

Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training

☆132

Alternatives and similar repositories for min-max-gpt

Users that are interested in min-max-gpt are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cloneofsimo / min-fsdp
View on GitHub
☆93Jul 5, 2024Updated 2 years ago
cloneofsimo / min-max-in-dit
View on GitHub
☆27May 3, 2024Updated 2 years ago
cloneofsimo / project_RF
View on GitHub
☆24Jun 4, 2024Updated 2 years ago
proger / nanokitchen
View on GitHub
Parallel Associative Scan for Language Models
☆18Jan 8, 2024Updated 2 years ago
cloneofsimo / imagenet.int8
View on GitHub
☆40Apr 27, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
cloneofsimo / ezmup
View on GitHub
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
☆88Jul 28, 2024Updated last year
cloneofsimo / d3pm
View on GitHub
Minimal Implementation of a D3PM in pytorch
☆308Apr 22, 2024Updated 2 years ago
cloneofsimo / scaling-guide
View on GitHub
WIP
☆96Aug 13, 2024Updated last year
cloneofsimo / karras-power-ema-tutorial
View on GitHub
☆53Jan 6, 2024Updated 2 years ago
fal-ai-community / nano-mdm
View on GitHub
Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun
☆57Mar 10, 2025Updated last year
graphcore-research / unit-scaling
View on GitHub
A library for unit scaling in PyTorch
☆135Jul 11, 2025Updated last year
ethansmith2000 / fsdp_optimizers
View on GitHub
supporting pytorch FSDP for optimizers
☆84Dec 8, 2024Updated last year
Chillee / lit-llama
View on GitHub
Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code
☆10Aug 29, 2023Updated 2 years ago
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
fal-ai / lavender-data
View on GitHub
Load & manage evolving datasets efficiently
☆22Aug 22, 2025Updated 11 months ago
mgmalek / efficient_cross_entropy
View on GitHub
☆124May 28, 2024Updated 2 years ago
OpenNLPLab / HGRN2
View on GitHub
HGRN2: Gated Linear RNNs with State Expansion
☆58Aug 20, 2024Updated last year
cloneofsimo / insightful-nn-papers
View on GitHub
These papers will provide unique insightful concepts that will broaden your perspective on neural networks and deep learning
☆48Sep 3, 2023Updated 2 years ago
cloneofsimo / minRF
View on GitHub
Minimal implementation of scalable rectified flow transformers, based on SD3's approach
☆641Jul 1, 2024Updated 2 years ago
jenni-ai / T2FW
View on GitHub
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆20Oct 9, 2022Updated 3 years ago
cloneofsimo / efae
View on GitHub
☆24Jun 18, 2024Updated 2 years ago
ethansmith2000 / clip-text-directions
View on GitHub
☆20May 29, 2026Updated last month
maximzubkov / fft-scan
View on GitHub
Efficient PScan implementation in PyTorch
☆17Jan 2, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
cloneofsimo / zeroshampoo
View on GitHub
☆33Sep 10, 2024Updated last year
glassroom / heinsen_sequence
View on GitHub
Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)
☆98Dec 5, 2024Updated last year
lucaslingle / mu_transformer
View on GitHub
Official implementation of 'A Large-Scale Exploration of mu-Transfer' (CoRR 2024)
☆31Jun 5, 2025Updated last year
lucidrains / taylor-series-linear-attention
View on GitHub
Explorations into the recently proposed Taylor Series Linear Attention
☆101Aug 18, 2024Updated last year
edwardmilsom / function-space-learning-rates-paper
View on GitHub
Code for the paper "Function-Space Learning Rates"
☆23Jun 3, 2025Updated last year
radarFudan / mamba-minimal-jax
View on GitHub
☆36Nov 22, 2024Updated last year
google-deepmind / nanodo
View on GitHub
☆304Jul 15, 2024Updated 2 years ago
Doraemonzzz / hgru-pytorch
View on GitHub
☆29Jul 9, 2024Updated 2 years ago
GindaChen / FlexFlashAttention3
View on GitHub
FlexAttention w/ FlashAttention3 Support
☆27Oct 5, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
cloneofsimo / minSAE
View on GitHub
☆30Dec 2, 2024Updated last year
tml-epfl / why-weight-decay
View on GitHub
Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]
☆73Sep 25, 2024Updated last year
Doraemonzzz / nanoTransNormer
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
dangxingyu / rnn-icrag
View on GitHub
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Apr 17, 2024Updated 2 years ago
tinkoff-ai / lb-sac
View on GitHub
Official implementation for "Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size", NeurIPS 2022, Offline RL Worksho…
☆21Feb 27, 2023Updated 3 years ago
Doraemonzzz / hgru2-pytorch
View on GitHub
☆24Sep 25, 2024Updated last year
HazyResearch / train-tk
View on GitHub
train with kittens!
☆67Oct 25, 2024Updated last year