Niccolo-Ajroldi / plainLMLinks
Minimal pretraining script for language modeling in PyTorch. Supporting torch compilation and DDP. It includes a model implementation and a data preprocessing script.
☆42Updated last month
Alternatives and similar repositories for plainLM
Users that are interested in plainLM are comparing it to the libraries listed below
Sorting:
- ☆233Updated 10 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆340Updated last month
- nanoGPT-like codebase for LLM training☆114Updated 2 months ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆188Updated 2 weeks ago
- ☆234Updated last year
- Accelerated First Order Parallel Associative Scan☆192Updated this week
- 🧱 Modula software package☆322Updated 4 months ago
- ☆52Updated 3 weeks ago
- supporting pytorch FSDP for optimizers☆84Updated last year
- ☆73Updated last year
- Pytorch code for experiments on Linear Transformers☆25Updated last year
- ☆62Updated last year
- MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvement…☆406Updated this week
- Parameter-Free Optimizers for Pytorch☆130Updated last year
- A MAD laboratory to improve AI architecture designs 🧪☆136Updated last year
- ☆287Updated last year
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆63Updated 9 months ago
- ☆115Updated 2 weeks ago
- Efficient Riemannian Optimization on Stiefel Manifold via Cayley Transform☆44Updated 6 years ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆181Updated 6 months ago
- Jax/Flax rewrite of Karpathy's nanoGPT☆62Updated 2 years ago
- Supporting code for the blog post on modular manifolds.☆109Updated 3 months ago
- A library for unit scaling in PyTorch☆133Updated 6 months ago
- Implementation of the "Online learning of long-range dependencies" paper, NeurIPS 2023☆21Updated last year
- ☆53Updated last year
- ASDL: Automatic Second-order Differentiation Library for PyTorch☆191Updated last year
- ☆20Updated last year
- Distributed K-FAC preconditioner for PyTorch☆93Updated 2 weeks ago
- ☆12Updated last year
- ☆40Updated 2 years ago