mlpen / YOSOLinks

☆21

Alternatives and similar repositories for YOSO

Users that are interested in YOSO are comparing it to the libraries listed below

Sorting:

dguo98 / DiffPruning
Parameter Efficient Transfer Learning with Diff Pruning
☆74Updated 4 years ago
lucidrains / memformer
Implementation of Memformer, a Memory-augmented Transformer, in Pytorch
☆124Updated 5 years ago
wjxts / RegularizedBN
☆21Updated 2 years ago
pkuzengqi / Skyformer
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)
☆63Updated 3 years ago
rabeehk / compacter
☆130Updated 3 years ago
berlino / gated_linear_attention
☆105Updated last year
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆56Updated 2 years ago
varunnair18 / FISH
Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).
☆59Updated 3 years ago
OpenNLPLab / Tnn
[ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling
☆80Updated last year
PiotrNawrot / dynamic-pooling
Efficient Transformers with Dynamic Token Pooling
☆64Updated 2 years ago
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆81Updated 2 years ago
XuezheMax / fairseq-apollo
FairSeq repo with Apollo optimizer
☆114Updated last year
OpenNLPLab / Transnormer
[EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer
☆63Updated 2 years ago
microsoft / Stochastic-Mixture-of-Experts
This package implements THOR: Transformer with Stochastic Experts.
☆65Updated 4 years ago
benzakenelad / BitFit
Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
☆142Updated 3 years ago
google-research / head2toe
☆81Updated last year
Noahs-ARK / RFA
☆33Updated 4 years ago
princeton-nlp / LM-Kernel-FT
A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643
☆78Updated 2 years ago
BaohaoLiao / mefts
[NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
☆33Updated 2 years ago
LAION-AI / Big-Interleaved-Dataset
Big-Interleaved-Dataset
☆57Updated 2 years ago
uiuctml / MergeBench
[NeurIPS 2025] MergeBench: A Benchmark for Merging Domain-Specialized LLMs
☆35Updated 2 months ago
RobertCsordas / linear_layer_as_attention
The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …
☆16Updated 5 months ago
lucidrains / memory-transformer-xl
A variant of Transformer-XL where the memory is updated not with a queue, but with attention
☆49Updated 5 years ago
HazyResearch / prefix-linear-attention
☆57Updated last year
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆49Updated 2 years ago
joshr17 / IFM
Code for paper "Can contrastive learning avoid shortcut solutions?" NeurIPS 2021.
☆47Updated 3 years ago
HKUNLP / efficient-attention
[EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling
☆87Updated 2 years ago
Shark-NLP / CAB
☆31Updated 2 years ago
ag1988 / top_k_attention
The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonatha…
☆70Updated 4 years ago
QingruZhang / PLATON
This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).
☆46Updated 3 years ago