PiotrNawrot / nanoT5
Fast & Simple repository for pre-training and fine-tuning T5-style models
β1,000Updated 7 months ago
Alternatives and similar repositories for nanoT5:
Users that are interested in nanoT5 are comparing it to the libraries listed below
- Cramming the training of a (BERT-type) language model into limited compute.β1,325Updated 9 months ago
- Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascriptβ574Updated 8 months ago
- π€ A PyTorch library of curated Transformer models and their composable componentsβ883Updated 11 months ago
- Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"β1,059Updated last year
- Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate β¦β631Updated last year
- What would you do with 1000 H100s...β1,021Updated last year
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorchβ859Updated last year
- The official implementation of βSophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-trainingββ955Updated last year
- [NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333β1,095Updated last year
- Extend existing LLMs way beyond the original training length with constant memory usage, without retrainingβ691Updated 11 months ago
- The hub for EleutherAI's work on interpretability and learning dynamicsβ2,429Updated 2 weeks ago
- minLoRA: a minimal PyTorch library that allows you to apply LoRA to any PyTorch model.β454Updated last year
- A modular RL library to fine-tune language models to human preferencesβ2,294Updated last year
- Tune any FALCON in 4-bitβ466Updated last year
- A repository for research on medium sized language models.β493Updated 2 months ago
- YaRN: Efficient Context Window Extension of Large Language Modelsβ1,451Updated 11 months ago
- An open collection of implementation tips, tricks and resources for training large language modelsβ471Updated 2 years ago
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackabβ¦β1,560Updated last year
- Code for fine-tuning Platypus fam LLMs using LoRAβ628Updated last year
- Finetuning Large Language Models on One Consumer GPU in 2 Bitsβ720Updated 10 months ago
- A crude RLHF layer on top of nanoGPT with Gumbel-Softmax trickβ289Updated last year
- Language Modeling with the H3 State Space Modelβ517Updated last year
- batched lorasβ340Updated last year
- A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)β1,113Updated last year
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorchβ638Updated 3 months ago
- Fine-tune mistral-7B on 3090s, a100s, h100sβ709Updated last year
- The repository for the code of the UltraFastBERT paperβ517Updated last year
- Minimalistic large language model 3D-parallelism trainingβ1,715Updated this week
- Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.β381Updated 9 months ago
- LOMO: LOw-Memory Optimizationβ981Updated 8 months ago