PiotrNawrot / nanoT5
Fast & Simple repository for pre-training and fine-tuning T5-style models
☆1,003Updated 8 months ago
Alternatives and similar repositories for nanoT5:
Users that are interested in nanoT5 are comparing it to the libraries listed below
- Cramming the training of a (BERT-type) language model into limited compute.☆1,331Updated 10 months ago
- Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"☆1,060Updated last year
- Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript☆576Updated 9 months ago
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch☆863Updated last year
- [NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333☆1,101Updated last year
- 🤖 A PyTorch library of curated Transformer models and their composable components☆884Updated last year
- The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”☆956Updated last year
- Fine-tune mistral-7B on 3090s, a100s, h100s☆710Updated last year
- Convolutions for Sequence Modeling☆877Updated 10 months ago
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆695Updated last year
- Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate …☆632Updated last year
- Tune any FALCON in 4-bit☆466Updated last year
- Inference code for Persimmon-8B☆415Updated last year
- YaRN: Efficient Context Window Extension of Large Language Models☆1,470Updated last year
- Code for fine-tuning Platypus fam LLMs using LoRA☆629Updated last year
- Language Modeling with the H3 State Space Model☆520Updated last year
- ☆412Updated last year
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch☆642Updated 4 months ago
- Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates☆452Updated last year
- A repository for research on medium sized language models.☆495Updated this week
- A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)☆1,114Updated last year
- LOMO: LOw-Memory Optimization☆985Updated 9 months ago
- A crude RLHF layer on top of nanoGPT with Gumbel-Softmax trick☆289Updated last year
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,378Updated last year
- The repository for the code of the UltraFastBERT paper☆518Updated last year
- Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"☆450Updated last year
- Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)☆463Updated 2 years ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆2,474Updated 8 months ago
- What would you do with 1000 H100s...☆1,038Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,242Updated last month