microsoft / encoder-decoder-slmLinks

Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and vision-language capabilities

☆33

Alternatives and similar repositories for encoder-decoder-slm

Users that are interested in encoder-decoder-slm are comparing it to the libraries listed below

Sorting:

luyug / magix
Supercharge huggingface transformers with model parallelism.
☆77Updated 4 months ago
epfml / DenseFormer
☆82Updated last year
NathanGodey / headless-lm
Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…
☆28Updated last year
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆87Updated 3 years ago
EleutherAI / improved-t5
Experiments for efforts to train a new and improved t5
☆76Updated last year
Aleph-Alpha-Research / trigrams
☆58Updated 2 weeks ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
bminixhofer / tokenkit
A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.
☆55Updated 5 months ago
JHU-CLSP / ettin-encoder-vs-decoder
State-of-the-art paired encoder and decoder models (17M-1B params)
☆53Updated 4 months ago
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated last year
catie-aq / flashT5
A fast implementation of T5/UL2 in PyTorch using Flash Attention
☆112Updated last month
allenai / EmbeddingRecycling
Embedding Recycling for Language models
☆38Updated 2 years ago
jkallini / mrt5
Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."
☆51Updated 2 months ago
Knowledgator / TurboT5
Truly flash T5 realization!
☆71Updated last year
EleutherAI / mdl
Minimum Description Length probing for neural network representations
☆20Updated 10 months ago
Knowledgator / FlashDeBERTa
Trully flash implementation of DeBERTa disentangled attention mechanism.
☆67Updated 2 months ago
ltgoslo / gpt-bert
Official implementation of "GPT or BERT: why not both?"
☆63Updated 4 months ago
bminixhofer / zett
Code for Zero-Shot Tokenizer Transfer
☆142Updated 10 months ago
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated last year
PythonNut / superbpe
Official code release for "SuperBPE: Space Travel for Language Models"
☆76Updated 2 weeks ago
trapoom555 / Language-Model-STS-CFT
Improving Text Embedding of Language Models Using Contrastive Fine-tuning
☆66Updated last year
CLAIRE-Labo / RAT
Official code for the NeurIPS25 paper "RAT: Bridging RNN Efficiencyand Attention Accuracy in Language Modeling" (https://arxiv.org/abs/25…
☆21Updated 4 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
KaiNylund / lm-weights-encode-time
☆69Updated last year
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
HazyResearch / aioli
Aioli: A unified optimization framework for language model data mixing
☆31Updated 10 months ago
McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆138Updated last year
AnswerDotAI / ModernBERT-Instruct-mini-cookbook
☆52Updated 9 months ago
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32Updated last year
allenai / bff
☆38Updated last year