100M tokens. Infinite compute. Lowest val loss wins.
☆472May 28, 2026Updated this week
Alternatives and similar repositories for slowrun
Users that are interested in slowrun are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for "What really matters in matrix-whitening optimizers?"☆24Oct 31, 2025Updated 7 months ago
- My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.☆38Apr 30, 2026Updated last month
- ☆26Feb 20, 2026Updated 3 months ago
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated 3 months ago
- Leo optimizer, variation of Muon that runs faster☆59Sep 6, 2025Updated 8 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A power user focused interface for LLM base models.☆70Updated this week
- ☆10Oct 24, 2024Updated last year
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated 2 years ago
- Voxtral: Convert Mistral into a end2end SpeechLM. No information bottleneck, preserves prosody, learns interruptions from data. Unlike GP…☆50Mar 7, 2025Updated last year
- ☆57Mar 13, 2026Updated 2 months ago
- Trains small LMs. Designed for training on SimpleStories☆14Sep 15, 2025Updated 8 months ago
- toy reproduction of Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts☆31Sep 1, 2024Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆192Jan 19, 2026Updated 4 months ago
- Benchmarking Optimizers for LLM Pretraining☆59May 3, 2026Updated 3 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Add ability to interrupt own message☆14Apr 21, 2024Updated 2 years ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆142May 6, 2026Updated 3 weeks ago
- ☆268Dec 2, 2024Updated last year
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆30Feb 6, 2026Updated 3 months ago
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- [NeurIPS 2025, Spotlight]: Ambient-o: Training Good models with Bad Data.☆34Apr 6, 2026Updated last month
- Open-sourced evaluation suite from the Monitoring Monitorability paper☆76Apr 22, 2026Updated last month
- Simple Transformer in Jax☆143Jun 22, 2024Updated last year
- ☆20Jan 27, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆28Oct 7, 2025Updated 7 months ago
- ☆35Jul 5, 2023Updated 2 years ago
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welc…☆23Mar 4, 2024Updated 2 years ago
- Model REVOLVER, a human in the loop model mixing system.☆33Aug 2, 2023Updated 2 years ago
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- Flax (JAX) implementation of Progressive Growing of GANs for Improved Quality, Stability, and Variation☆12May 24, 2021Updated 5 years ago
- NanoGPT (124M) in 90 seconds☆5,307May 25, 2026Updated last week
- extending laughbot project to encoder-based transformer model finetuned on same dataset for humor classification☆10Jan 4, 2023Updated 3 years ago
- supporting pytorch FSDP for optimizers☆84Dec 8, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆152Oct 2, 2025Updated 8 months ago
- A GPU accelerated Mandelbrot viewer made using the new WebGPU API.☆10Oct 26, 2023Updated 2 years ago
- ☆67Apr 12, 2025Updated last year
- A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.☆73Apr 15, 2026Updated last month
- official code for paper Probing the Decision Boundaries of In-context Learning in Large Language Models. https://arxiv.org/abs/2406.11233…☆20Jul 27, 2025Updated 10 months ago
- My recipes for doing continuous wavelet and biwavelet analysis.☆13Feb 4, 2022Updated 4 years ago
- UD Greek☆22May 6, 2026Updated 3 weeks ago