alexjc / nanogpt-speedrun
NanoGPT (124M) in 5 minutes
☆9Updated 2 months ago
Alternatives and similar repositories for nanogpt-speedrun:
Users that are interested in nanogpt-speedrun are comparing it to the libraries listed below
- TensorLens☆10Updated this week
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆30Updated this week
- ☆21Updated 5 months ago
- ☆52Updated last month
- ☆41Updated 2 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆29Updated last week
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆17Updated last month
- ☆32Updated this week
- ☆49Updated last year
- Utilities for PyTorch distributed☆24Updated last month
- Jax like function transformation engine but micro, microjax☆30Updated 5 months ago
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆23Updated 3 months ago
- ☆48Updated 2 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 7 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆39Updated 2 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- Collection of autoregressive model implementation☆85Updated 2 months ago
- Triton Implementation of HyperAttention Algorithm☆47Updated last year
- Focused on fast experimentation and simplicity☆71Updated 4 months ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆49Updated last week
- ☆19Updated 3 weeks ago
- QLoRA for Masked Language Modeling☆22Updated last year
- A collection of optimizers for MLX☆35Updated last month
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆35Updated 2 months ago
- An implementation of the Llama architecture, to instruct and delight☆21Updated 3 months ago
- DPO, but faster 🚀☆40Updated 4 months ago
- An introduction to LLM Sampling☆77Updated 4 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆96Updated last month
- ☆47Updated 7 months ago
- Implementation of Spectral State Space Models☆16Updated last year