Knowledgator / TurboT5Links

Truly flash T5 realization!

☆70

Alternatives and similar repositories for TurboT5

Users that are interested in TurboT5 are comparing it to the libraries listed below

Sorting:

catie-aq / flashT5
A fast implementation of T5/UL2 in PyTorch using Flash Attention
☆107Updated 6 months ago
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated last year
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆56Updated this week
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆115Updated 2 years ago
euclaise / supertrainer2000
☆49Updated last year
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆82Updated 3 years ago
bminixhofer / zett
Code for Zero-Shot Tokenizer Transfer
☆138Updated 8 months ago
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆89Updated 2 months ago
Upaya07 / NeurIPS-llm-efficiency-challenge
Code for NeurIPS LLM Efficiency Challenge
☆59Updated last year
Knowledgator / FlashDeBERTa
Trully flash implementation of DeBERTa disentangled attention mechanism.
☆66Updated last week
NathanGodey / headless-lm
Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…
☆27Updated last year
luyug / magix
Supercharge huggingface transformers with model parallelism.
☆77Updated 2 months ago
microsoft / encoder-decoder-slm
Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and visi…
☆30Updated 8 months ago
Zyphra / Zyda_processing
☆39Updated last year
mgmalek / efficient_cross_entropy
☆122Updated last year
trapoom555 / Language-Model-STS-CFT
Improving Text Embedding of Language Models Using Contrastive Fine-tuning
☆64Updated last year
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated last year
google-research-datasets / swim-ir
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…
☆49Updated last year
bminixhofer / tokenkit
A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.
☆46Updated 3 months ago
huggingface / olm-training
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
☆95Updated 2 years ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
jxmorris12 / bm25_pt
minimal pytorch implementation of bm25 (with sparse tensors)
☆104Updated last year
Aleph-Alpha-Research / trigrams
☆57Updated last week
kyegomez / LM-Infinite
Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆39Updated 11 months ago
jkallini / mrt5
Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."
☆51Updated 2 weeks ago
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 4 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆99Updated last year
google-research-datasets / QAmeleon
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆34Updated 2 years ago
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆230Updated last year