EleutherAI / nanoGPT-mupLinks

The simplest, fastest repository for training/finetuning medium-sized GPTs.

☆166

Alternatives and similar repositories for nanoGPT-mup

Users that are interested in nanoGPT-mup are comparing it to the libraries listed below

Sorting:

athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆131Updated 10 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆233Updated 3 weeks ago
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆107Updated 5 months ago
cloneofsimo / min-fsdp
☆91Updated last year
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
berlino / seq_icl
☆53Updated last year
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆241Updated 4 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆192Updated 11 months ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆84Updated 11 months ago
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆192Updated last year
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆63Updated 2 weeks ago
google-deepmind / mishax
☆142Updated last month
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆54Updated last month
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆100Updated last year
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆85Updated 3 years ago
Aleph-Alpha-Research / scaling
Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…
☆64Updated 3 weeks ago
google-deepmind / nanodo
☆283Updated last year
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆216Updated last year
lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆179Updated 4 months ago
shikaiqiu / compute-better-spent
☆58Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 10 months ago
RobertCsordas / moeut
☆86Updated last year
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆69Updated last year
insuhan / hyper-attn
☆83Updated last year
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆102Updated 3 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆129Updated last year
JackCai1206 / arithmetic-self-improve
☆36Updated 8 months ago
meta-pytorch / torchforge
PyTorch-native post-training at scale
☆83Updated this week
huyphan168 / PEER
Mixture of A Million Experts
☆48Updated last year