Niccolo-Ajroldi / plainLMLinks
Minimal pretraining script for language modeling in PyTorch. Supporting torch compilation and DDP. It includes a model implementation and a data preprocessing script.
β41Updated last month
Alternatives and similar repositories for plainLM
Users that are interested in plainLM are comparing it to the libraries listed below
Sorting:
- β246Updated last year
- π§± Modula software packageβ322Updated 5 months ago
- β289Updated last year
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation preconditionβ¦β191Updated 3 weeks ago
- nanoGPT-like codebase for LLM trainingβ113Updated 2 months ago
- β234Updated 11 months ago
- supporting pytorch FSDP for optimizersβ84Updated last year
- β52Updated last month
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 secondsβ349Updated 2 months ago
- Accelerated First Order Parallel Associative Scanβ196Updated 3 weeks ago
- A library for unit scaling in PyTorchβ133Updated 6 months ago
- Maximal Update Parametrization (ΞΌP) with Flax & Optax.β16Updated 2 years ago
- Pytorch code for experiments on Linear Transformersβ25Updated 2 years ago
- β13Updated last month
- Minimal but scalable implementation of large language models in JAXβ35Updated 2 months ago
- MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvementβ¦β406Updated this week
- Jax/Flax rewrite of Karpathy's nanoGPTβ63Updated 2 years ago
- β73Updated last year
- Code for the paper: Why Transformers Need Adam: A Hessian Perspectiveβ63Updated 10 months ago
- β62Updated last year
- Efficient optimizersβ281Updated last month
- A MAD laboratory to improve AI architecture designs π§ͺβ135Updated last year
- A simple library for scaling up JAX programsβ145Updated 3 months ago
- Parameter-Free Optimizers for Pytorchβ130Updated last year
- seqax = sequence modeling + JAXβ170Updated 6 months ago
- β132Updated 2 weeks ago
- Implementation of PSGD optimizer in JAXβ35Updated last year
- β53Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β186Updated 2 weeks ago
- β92Updated last year