Niccolo-Ajroldi / plainLMLinks
Minimal pretraining script for language modeling in PyTorch. Supporting torch compilation and DDP. It includes a model implementation and a data preprocessing.
☆27Updated 2 weeks ago
Alternatives and similar repositories for plainLM
Users that are interested in plainLM are comparing it to the libraries listed below
Sorting:
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆179Updated last month
- ☆10Updated 2 months ago
- 🧱 Modula software package☆204Updated 3 months ago
- nanoGPT-like codebase for LLM training☆100Updated 2 months ago
- ☆197Updated 7 months ago
- ☆53Updated 9 months ago
- ☆17Updated last year
- ☆230Updated 5 months ago
- ☆26Updated 2 weeks ago
- Implementation of the "Online learning of long-range dependencies" paper, NeurIPS 2023☆18Updated 8 months ago
- supporting pytorch FSDP for optimizers☆82Updated 7 months ago
- Parameter-Free Optimizers for Pytorch☆130Updated last year
- ☆70Updated 7 months ago
- Code accompanying our paper "Feature Learning in Infinite-Width Neural Networks" (https://arxiv.org/abs/2011.14522)☆62Updated 4 years ago
- ASDL: Automatic Second-order Differentiation Library for PyTorch☆188Updated 7 months ago
- ☆53Updated last year
- Minimal but scalable implementation of large language models in JAX☆35Updated last week
- ☆273Updated last year
- Optimization algorithm which fits a ResNet to CIFAR-10 5x faster than SGD / Adam (with terrible generalization)☆14Updated last year
- ☆40Updated last year
- A library for unit scaling in PyTorch☆125Updated this week
- Accelerated First Order Parallel Associative Scan☆182Updated 10 months ago
- Jax/Flax rewrite of Karpathy's nanoGPT☆59Updated 2 years ago
- Agustinus' very opiniated publication-ready plotting library☆67Updated 2 months ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆84Updated last year
- Maximal Update Parametrization (μP) with Flax & Optax.☆11Updated last year
- LoRA for arbitrary JAX models and functions☆140Updated last year
- ☆51Updated last year
- Open source code for EigenGame.☆30Updated 2 years ago
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆63Updated last year