Niccolo-Ajroldi / plainLMLinks

Minimal pretraining script for language modeling in PyTorch. Supporting torch compilation and DDP. It includes a model implementation and a data preprocessing script.

☆39

Alternatives and similar repositories for plainLM

Users that are interested in plainLM are comparing it to the libraries listed below

Sorting:

epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆111Updated 3 weeks ago
google-research / jaxpruner
☆234Updated 9 months ago
lixilinx / psgd_torch
Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…
☆188Updated last month
KellerJordan / cifar10-airbench
CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds
☆327Updated 2 weeks ago
google-deepmind / nanodo
☆285Updated last year
modula-systems / modula
🧱 Modula software package
☆307Updated 3 months ago
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated 11 months ago
LIONS-EPFL / scion
☆47Updated last month
shikaiqiu / compute-better-spent
☆61Updated last year
formll / dog
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
☆63Updated 2 years ago
nikhilvyas / SOAP
☆224Updated 11 months ago
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated this week
zyushun / hessian-spectrum
Code for the paper: Why Transformers Need Adam: A Hessian Perspective
☆63Updated 8 months ago
locuslab / edge-of-stability
☆72Updated 11 months ago
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 4 months ago
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆192Updated last year
MatX-inc / seqax
seqax = sequence modeling + JAX
☆168Updated 4 months ago
young-geng / scalax
A simple library for scaling up JAX programs
☆144Updated 3 weeks ago
jenkspt / gpt-jax
Jax/Flax rewrite of Karpathy's nanoGPT
☆62Updated 2 years ago
thinking-machines-lab / manifolds
Supporting code for the blog post on modular manifolds.
☆103Updated 2 months ago
mlcommons / algorithmic-efficiency
MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvement…
☆401Updated this week
ZQZCalin / trainit
☆13Updated last month
evanatyourservice / psgd_jax
Implementation of PSGD optimizer in JAX
☆35Updated 11 months ago
chengxiang / LinearTransformer
Pytorch code for experiments on Linear Transformers
☆23Updated last year
jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆202Updated 2 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆174Updated 5 months ago
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆64Updated last month
kazukiosawa / asdl
ASDL: Automatic Second-order Differentiation Library for PyTorch
☆190Updated 11 months ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆135Updated 11 months ago
berlino / seq_icl
☆53Updated last year