PiotrNawrot / nanoT5Links

Fast & Simple repository for pre-training and fine-tuning T5-style models

☆1,006

Alternatives and similar repositories for nanoT5

Users that are interested in nanoT5 are comparing it to the libraries listed below

Sorting:

JonasGeiping / cramming
Cramming the training of a (BERT-type) language model into limited compute.
☆1,339Updated last year
alasdairforsythe / tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
☆591Updated last year
abertsch72 / unlimiformer
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
☆1,062Updated last year
explosion / curated-transformers
🤖 A PyTorch library of curated Transformer models and their composable components
☆892Updated last year
Liuhong99 / Sophia
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
☆964Updated last year
HazyResearch / safari
Convolutions for Sequence Modeling
☆893Updated last year
HazyResearch / H3
Language Modeling with the H3 State Space Model
☆519Updated last year
lucidrains / RETRO-pytorch
Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
☆869Updated last year
princeton-nlp / MeZO
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
☆1,116Updated last year
rmihaylov / falcontune
Tune any FALCON in 4-bit
☆467Updated last year
mlfoundations / open_lm
A repository for research on medium sized language models.
☆505Updated last month
abacaj / fine-tune-mistral
Fine-tune mistral-7B on 3090s, a100s, h100s
☆715Updated last year
kuleshov-group / llmtools
Finetuning Large Language Models on One Consumer GPU in 2 Bits
☆727Updated last year
pbelcak / UltraFastBERT
The repository for the code of the UltraFastBERT paper
☆516Updated last year
lucidrains / memorizing-transformers-pytorch
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate …
☆634Updated 2 years ago
lucidrains / MEGABYTE-pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
☆646Updated 6 months ago
tomaarsen / attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
☆702Updated last year
sanjeevanahilan / nanoChatGPT
A crude RLHF layer on top of nanoGPT with Gumbel-Softmax trick
☆290Updated last year
HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆556Updated 6 months ago
tysam-code / hlb-gpt
Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…
☆349Updated 11 months ago
arielnlee / Platypus
Code for fine-tuning Platypus fam LLMs using LoRA
☆628Updated last year
sabetAI / BLoRA
batched loras
☆344Updated last year
Guitaricet / relora
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
☆459Updated last year
EleutherAI / pythia
The hub for EleutherAI's work on interpretability and learning dynamics
☆2,570Updated last month
huggingface / large_language_model_training_playbook
An open collection of implementation tips, tricks and resources for training large language models
☆478Updated 2 years ago
microsoft / mup
maximal update parametrization (µP)
☆1,567Updated last year
persimmon-ai-labs / adept-inference
Inference code for Persimmon-8B
☆415Updated last year
kyegomez / Sophia
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
☆379Updated last year
changjonathanc / minLoRA
minLoRA: a minimal PyTorch library that allows you to apply LoRA to any PyTorch model.
☆468Updated 2 years ago
HazyResearch / ama_prompting
Ask Me Anything language model prompting
☆548Updated 2 years ago