Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 seconds. Scales to larger models with one parameter change (feature currently in alpha).
☆355Jul 29, 2024Updated last year
Alternatives and similar repositories for hlb-gpt
Users that are interested in hlb-gpt are comparing it to the libraries listed below
Sorting:
- Train to 94% on CIFAR-10 in <6.3 seconds on a single A100. Or ~95.79% in ~110 seconds (or less!)☆1,299Dec 18, 2024Updated last year
- ☆145Mar 31, 2023Updated 2 years ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆187Jan 19, 2026Updated last month
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆132Apr 17, 2024Updated last year
- Collection of autoregressive model implementation☆85Updated this week
- ☆292Jul 15, 2024Updated last year
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Jun 21, 2023Updated 2 years ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Jan 7, 2024Updated 2 years ago
- Low-Rank adapter extraction for fine-tuned transformers models☆180May 2, 2024Updated last year
- Schedule-Free Optimization in PyTorch☆2,256May 21, 2025Updated 9 months ago
- Full finetuning of large language models without large memory requirements☆94Sep 22, 2025Updated 5 months ago
- Code for the paper "Function-Space Learning Rates"☆25Jun 3, 2025Updated 8 months ago
- Fast & Simple repository for pre-training and fine-tuning T5-style models☆1,017Aug 21, 2024Updated last year
- [WIP] Transformer to embed Danbooru labelsets☆13Mar 31, 2024Updated last year
- WIP☆94Aug 13, 2024Updated last year
- ☆316Jun 21, 2024Updated last year
- ☆124May 28, 2024Updated last year
- seqax = sequence modeling + JAX☆171Jul 23, 2025Updated 7 months ago
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆693Jan 26, 2026Updated last month
- Just a bunch of benchmark logs for different LLMs☆119Jul 28, 2024Updated last year
- Entropy Based Sampling and Parallel CoT Decoding☆3,434Nov 13, 2024Updated last year
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆595Aug 12, 2025Updated 6 months ago
- NanoGPT (124M) in 2 minutes☆4,679Updated this week
- supporting pytorch FSDP for optimizers☆84Dec 8, 2024Updated last year
- implementation of https://arxiv.org/pdf/2312.09299☆21Jul 3, 2024Updated last year
- Cramming the training of a (BERT-type) language model into limited compute.☆1,363Jun 13, 2024Updated last year
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆359Nov 15, 2025Updated 3 months ago
- Implementation of Diffusion Transformer (DiT) in JAX☆305Jun 11, 2024Updated last year
- Memory-efficient transformer. Work in progress.☆19Sep 17, 2022Updated 3 years ago
- HomebrewNLP in JAX flavour for maintable TPU-Training☆51Jan 20, 2024Updated 2 years ago
- PyTorch interface for TrueGrad Optimizers☆43Aug 8, 2023Updated 2 years ago
- ☆53May 20, 2024Updated last year
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆32Jun 5, 2025Updated 8 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆280Nov 24, 2025Updated 3 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆138Dec 17, 2024Updated last year
- ☆50Mar 14, 2024Updated last year
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆57Mar 10, 2025Updated 11 months ago
- 🤖 A PyTorch library of curated Transformer models and their composable components☆894Apr 17, 2024Updated last year
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆233Oct 31, 2024Updated last year