Niccolo-Ajroldi / plainLMLinks
Minimal pretraining script for language modeling in PyTorch. Supporting torch compilation and DDP. It includes a model implementation and a data preprocessing.
☆36Updated 2 weeks ago
Alternatives and similar repositories for plainLM
Users that are interested in plainLM are comparing it to the libraries listed below
Sorting:
- ☆234Updated 8 months ago
- nanoGPT-like codebase for LLM training☆110Updated this week
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆188Updated 2 weeks ago
- ☆45Updated last week
- supporting pytorch FSDP for optimizers☆83Updated 10 months ago
- ☆283Updated last year
- Parameter-Free Optimizers for Pytorch☆131Updated last year
- ☆58Updated last year
- 🧱 Modula software package☆299Updated 2 months ago
- ☆220Updated 11 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆320Updated 3 months ago
- A library for unit scaling in PyTorch☆132Updated 3 months ago
- MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvement…☆400Updated last week
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆63Updated 2 years ago
- ☆71Updated 10 months ago
- Accelerated First Order Parallel Associative Scan☆189Updated last year
- ASDL: Automatic Second-order Differentiation Library for PyTorch☆190Updated 10 months ago
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆35Updated 3 years ago
- ☆13Updated last week
- ☆18Updated last year
- seqax = sequence modeling + JAX☆168Updated 3 months ago
- ☆53Updated last year
- ☆120Updated 4 months ago
- Implementation of PSGD optimizer in JAX☆35Updated 10 months ago
- Parallelizing non-linear sequential models over the sequence length☆54Updated 4 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆132Updated 10 months ago
- Supporting code for the blog post on modular manifolds.☆94Updated last month
- LoRA for arbitrary JAX models and functions☆141Updated last year
- The official code of "Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers"☆19Updated last year
- Omnigrok: Grokking Beyond Algorithmic Data☆62Updated 2 years ago