100M tokens. Infinite compute. Lowest val loss wins.
☆456May 5, 2026Updated this week
Alternatives and similar repositories for slowrun
Users that are interested in slowrun are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for "What really matters in matrix-whitening optimizers?"☆23Oct 31, 2025Updated 6 months ago
- My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.☆38Apr 30, 2026Updated last week
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated 2 months ago
- Leo optimizer, variation of Muon that runs faster☆59Sep 6, 2025Updated 8 months ago
- A power user focused interface for LLM base models.☆68Apr 22, 2026Updated 2 weeks ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆58Mar 13, 2026Updated last month
- The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size☆19May 19, 2019Updated 6 years ago
- ☆37Feb 26, 2024Updated 2 years ago
- Efficient Scaling laws and collaborative pretraining.☆22Sep 18, 2025Updated 7 months ago
- Benchmarking Optimizers for LLM Pretraining☆57May 3, 2026Updated last week
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆140Updated this week
- Code for minimum-entropy coupling.☆33Jan 6, 2026Updated 4 months ago
- ☆16Aug 7, 2024Updated last year
- ☆267Dec 2, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆30Feb 6, 2026Updated 3 months ago
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- [NeurIPS 2025, Spotlight]: Ambient-o: Training Good models with Bad Data.☆34Apr 6, 2026Updated last month
- ☆56Updated this week
- Automatically review Claude Code plans using external AI CLIs☆56Mar 2, 2026Updated 2 months ago
- Polynomial semantics of linear logic☆13Apr 15, 2018Updated 8 years ago
- Timelight: Universal Path Generator☆23Aug 24, 2025Updated 8 months ago
- Simple Transformer in Jax☆143Jun 22, 2024Updated last year
- Haskell port of the Tensor Algebra COmpiler☆16Nov 18, 2019Updated 6 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- JAX implementation of VQVAE/VQGAN autoencoders (+FSQ)☆41Jun 6, 2024Updated last year
- ☆20Jan 27, 2024Updated 2 years ago
- ☆35Jul 5, 2023Updated 2 years ago
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welc…☆23Mar 4, 2024Updated 2 years ago
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- Flax (JAX) implementation of Progressive Growing of GANs for Improved Quality, Stability, and Variation☆12May 24, 2021Updated 4 years ago
- NanoGPT (124M) in 90 seconds☆5,200May 4, 2026Updated last week
- Haskell implementation of open games☆13Apr 20, 2016Updated 10 years ago
- supporting pytorch FSDP for optimizers☆84Dec 8, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆152Oct 2, 2025Updated 7 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆192Jan 19, 2026Updated 3 months ago
- A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.☆73Apr 15, 2026Updated 3 weeks ago
- ADAG: Transluce's MLP neuron-level circuit tracing library☆25Apr 10, 2026Updated last month
- Orca is a workspace for vibe coding built upon the principals of tracking what the agent changes and only keeping what you want☆61Updated this week
- Haskell graph library☆10Dec 18, 2017Updated 8 years ago
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆86Jan 12, 2025Updated last year