PiotrNawrot / nanoT5
Fast & Simple repository for pre-training and fine-tuning T5-style models
β986Updated 4 months ago
Alternatives and similar repositories for nanoT5:
Users that are interested in nanoT5 are comparing it to the libraries listed below
- π€ A PyTorch library of curated Transformer models and their composable componentsβ873Updated 9 months ago
- Cramming the training of a (BERT-type) language model into limited compute.β1,307Updated 7 months ago
- Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascriptβ562Updated 6 months ago
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"β541Updated 2 weeks ago
- [NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333β1,075Updated last year
- Extend existing LLMs way beyond the original training length with constant memory usage, without retrainingβ684Updated 9 months ago
- Fine-tune mistral-7B on 3090s, a100s, h100sβ704Updated last year
- YaRN: Efficient Context Window Extension of Large Language Modelsβ1,398Updated 9 months ago
- The official implementation of βSophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-trainingββ941Updated 11 months ago
- Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"β1,060Updated 10 months ago
- batched lorasβ336Updated last year
- What would you do with 1000 H100s...β948Updated last year
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorchβ857Updated last year
- Minimalistic large language model 3D-parallelism trainingβ1,386Updated this week
- An open collection of implementation tips, tricks and resources for training large language modelsβ466Updated last year
- Tune any FALCON in 4-bitβ466Updated last year
- A repository for research on medium sized language models.β484Updated this week
- Inference code for Persimmon-8Bβ416Updated last year
- distributed trainer for LLMsβ555Updated 7 months ago
- Build, evaluate, understand, and fix LLM-based appsβ484Updated last year
- Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updatesβ439Updated 8 months ago
- β413Updated last year
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Dayβ253Updated last year
- Language Modeling with the H3 State Space Modelβ516Updated last year
- Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate β¦β628Updated last year
- The hub for EleutherAI's work on interpretability and learning dynamicsβ2,339Updated last month
- Finetuning Large Language Models on One Consumer GPU in 2 Bitsβ714Updated 7 months ago
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorchβ632Updated 3 weeks ago
- Creative interactive views of any dataset.β831Updated 3 weeks ago