Niccolo-Ajroldi / plainLMLinks
Minimal pretraining script for language modeling in PyTorch. Supporting torch compilation and DDP. It includes a model implementation and a data preprocessing script.
☆39Updated 3 weeks ago
Alternatives and similar repositories for plainLM
Users that are interested in plainLM are comparing it to the libraries listed below
Sorting:
- nanoGPT-like codebase for LLM training☆111Updated 3 weeks ago
- ☆234Updated 9 months ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆188Updated last month
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆327Updated 2 weeks ago
- ☆285Updated last year
- 🧱 Modula software package☆307Updated 3 months ago
- supporting pytorch FSDP for optimizers☆84Updated 11 months ago
- ☆47Updated last month
- ☆61Updated last year
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆63Updated 2 years ago
- ☆224Updated 11 months ago
- Minimal but scalable implementation of large language models in JAX☆35Updated this week
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆63Updated 8 months ago
- ☆72Updated 11 months ago
- A library for unit scaling in PyTorch☆132Updated 4 months ago
- Accelerated First Order Parallel Associative Scan☆192Updated last year
- seqax = sequence modeling + JAX☆168Updated 4 months ago
- A simple library for scaling up JAX programs☆144Updated 3 weeks ago
- Jax/Flax rewrite of Karpathy's nanoGPT☆62Updated 2 years ago
- Supporting code for the blog post on modular manifolds.☆103Updated 2 months ago
- MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvement…☆401Updated this week
- ☆13Updated last month
- Implementation of PSGD optimizer in JAX☆35Updated 11 months ago
- Pytorch code for experiments on Linear Transformers☆23Updated last year
- Minimal yet performant LLM examples in pure JAX☆202Updated 2 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆174Updated 5 months ago
- 📄Small Batch Size Training for Language Models☆64Updated last month
- ASDL: Automatic Second-order Differentiation Library for PyTorch☆190Updated 11 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆135Updated 11 months ago
- ☆53Updated last year