conceptofmind / t5-pytorchLinks

Implementation of Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer in PyTorch.

☆52

Alternatives and similar repositories for t5-pytorch

Users that are interested in t5-pytorch are comparing it to the libraries listed below

Sorting:

igul222 / plaid
☆108Updated 2 years ago
RobertCsordas / moeut
☆89Updated last year
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆53Updated 2 years ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
berlino / gated_linear_attention
☆106Updated last year
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆113Updated 11 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆102Updated last year
gregorbachmann / Next-Token-Failures
☆106Updated last year
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆56Updated 2 years ago
McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆138Updated last year
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆217Updated last year
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆100Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated last month
lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆180Updated 5 months ago
lucidrains / mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆119Updated last year
shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆95Updated 3 weeks ago
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆100Updated last year
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆78Updated last year
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
HazyResearch / prefix-linear-attention
☆57Updated last year
test-time-training / ttt-lm-kernels
Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States
☆75Updated last year
zhixuan-lin / forgetting-transformer
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆134Updated last month
OpenNLPLab / Tnn
[ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling
☆80Updated last year
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆73Updated last year
booydar / LM-RMT
Recurrent Memory Transformer
☆154Updated 2 years ago
ScalingIntelligence / large_language_monkeys
☆109Updated last year
google-deepmind / randomized_positional_encodings
Randomized Positional Encodings Boost Length Generalization of Transformers
☆83Updated last year
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆85Updated last year