Liuhong99 / SophiaLinks

The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”

☆967

Alternatives and similar repositories for Sophia

Users that are interested in Sophia are comparing it to the libraries listed below

Sorting:

kyegomez / Sophia
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
☆379Updated last year
HazyResearch / safari
Convolutions for Sequence Modeling
☆895Updated last year
JonasGeiping / cramming
Cramming the training of a (BERT-type) language model into limited compute.
☆1,345Updated last year
microsoft / mup
maximal update parametrization (µP)
☆1,584Updated last year
lucidrains / MEGABYTE-pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
☆649Updated 7 months ago
PiotrNawrot / nanoT5
Fast & Simple repository for pre-training and fine-tuning T5-style models
☆1,007Updated last year
princeton-nlp / MeZO
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
☆1,120Updated last year
HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆557Updated 7 months ago
srush / annotated-mamba
Annotated version of the Mamba paper
☆487Updated last year
HazyResearch / H3
Language Modeling with the H3 State Space Model
☆519Updated last year
changjonathanc / minLoRA
minLoRA: a minimal PyTorch library that allows you to apply LoRA to any PyTorch model.
☆474Updated 2 years ago
lucidrains / recurrent-memory-transformer-pytorch
Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch
☆413Updated 7 months ago
lucidrains / memorizing-transformers-pytorch
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate …
☆634Updated 2 years ago
abertsch72 / unlimiformer
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
☆1,062Updated last year
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆536Updated 3 months ago
OpenLMLab / LOMO
LOMO: LOw-Memory Optimization
☆989Updated last year
mlfoundations / open_lm
A repository for research on medium sized language models.
☆509Updated 2 months ago
lucidrains / lion-pytorch
🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch
☆2,153Updated 8 months ago
ELS-RD / kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…
☆1,581Updated last year
kuleshov-group / llmtools
Finetuning Large Language Models on One Consumer GPU in 2 Bits
☆728Updated last year
Guitaricet / relora
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
☆461Updated last year
lucidrains / st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
☆359Updated last year
alasdairforsythe / tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
☆599Updated last year
Vahe1994 / SpQR
☆546Updated 8 months ago
facebookresearch / optimizers
For optimization algorithm research and development.
☆530Updated this week
facebookresearch / dadaptation
D-Adaptation for SGD, Adam and AdaGrad
☆523Updated 7 months ago
haoliuhl / ringattention
Large Context Attention
☆727Updated 7 months ago
ofirpress / attention_with_linear_biases
Code for the ALiBi method for transformer language models (ICLR 2022)
☆539Updated last year
google / learned_optimization
☆783Updated 2 months ago
kyegomez / LongNet
Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
☆709Updated last year