Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 seconds. Scales to larger models with one parameter change (feature currently in alpha).
☆355Jul 29, 2024Updated last year
Alternatives and similar repositories for hlb-gpt
Users that are interested in hlb-gpt are comparing it to the libraries listed below
Sorting:
- Train to 94% on CIFAR-10 in <6.3 seconds on a single A100. Or ~95.79% in ~110 seconds (or less!)☆1,300Dec 18, 2024Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆190Jan 19, 2026Updated 2 months ago
- ☆145Mar 31, 2023Updated 2 years ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆132Apr 17, 2024Updated last year
- Collection of autoregressive model implementation☆85Feb 23, 2026Updated 3 weeks ago
- It's a baby compiler. (Lean btw.)☆16May 19, 2025Updated 10 months ago
- implementation of https://arxiv.org/pdf/2312.09299☆21Jul 3, 2024Updated last year
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆32Jun 5, 2025Updated 9 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Jun 21, 2023Updated 2 years ago
- NanoGPT (124M) in 2 minutes☆4,848Updated this week
- Low-Rank adapter extraction for fine-tuned transformers models☆181May 2, 2024Updated last year
- Schedule-Free Optimization in PyTorch☆2,265May 21, 2025Updated 10 months ago
- Fast & Simple repository for pre-training and fine-tuning T5-style models☆1,018Aug 21, 2024Updated last year
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆698Jan 26, 2026Updated last month
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Jan 7, 2024Updated 2 years ago
- WIP☆94Aug 13, 2024Updated last year
- ☆124May 28, 2024Updated last year
- Full finetuning of large language models without large memory requirements☆94Sep 22, 2025Updated 6 months ago
- supporting pytorch FSDP for optimizers☆84Dec 8, 2024Updated last year
- ☆306Jul 15, 2024Updated last year
- Entropy Based Sampling and Parallel CoT Decoding☆3,434Nov 13, 2024Updated last year
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆598Aug 12, 2025Updated 7 months ago
- ☆54May 20, 2024Updated last year
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆282Nov 24, 2025Updated 3 months ago
- ☆63Mar 4, 2022Updated 4 years ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆365Nov 15, 2025Updated 4 months ago
- ☆317Jun 21, 2024Updated last year
- High-performance tokenized language data-loader for Python C++ extension☆14Jul 22, 2024Updated last year
- ☆15Oct 31, 2023Updated 2 years ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆281Nov 3, 2023Updated 2 years ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆19Jul 24, 2025Updated 7 months ago
- Just a bunch of benchmark logs for different LLMs☆119Jul 28, 2024Updated last year
- Cramming the training of a (BERT-type) language model into limited compute.☆1,362Jun 13, 2024Updated last year
- PyTorch interface for TrueGrad Optimizers☆43Aug 8, 2023Updated 2 years ago
- Fast reinforcement learning 💨☆28Jul 15, 2025Updated 8 months ago
- ☆48Feb 23, 2025Updated last year
- The repository for the code of the UltraFastBERT paper☆519Mar 24, 2024Updated last year
- ☆13Jun 18, 2024Updated last year
- [WIP] Transformer to embed Danbooru labelsets☆13Mar 31, 2024Updated last year