microsoft / encoder-decoder-slmLinks
Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and vision-language capabilities
☆29Updated 7 months ago
Alternatives and similar repositories for encoder-decoder-slm
Users that are interested in encoder-decoder-slm are comparing it to the libraries listed below
Sorting:
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆63Updated last week
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆107Updated 5 months ago
- Experiments for efforts to train a new and improved t5☆76Updated last year
- A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.☆46Updated 2 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆27Updated last year
- some common Huggingface transformers in maximal update parametrization (µP)☆82Updated 3 years ago
- Truly flash T5 realization!☆70Updated last year
- Official implementation of "BERTs are Generative In-Context Learners"☆32Updated 6 months ago
- ☆82Updated last year
- ☆49Updated 7 months ago
- Supercharge huggingface transformers with model parallelism.☆77Updated last month
- Pre-train Static Word Embeddings☆85Updated last week
- Code for Zero-Shot Tokenizer Transfer