RobertCsordas / moe_attentionLinks

Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"

☆99

Alternatives and similar repositories for moe_attention

Users that are interested in moe_attention are comparing it to the libraries listed below

Sorting:

TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 4 months ago
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 6 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆129Updated last year
RobertCsordas / moeut
☆86Updated last year
SalesforceAIResearch / GemFilter
☆85Updated 9 months ago
kyleliang919 / Online-Subspace-Descent
[NeurIPS 2024] Low rank memory efficient optimizer without SVD
☆30Updated 3 months ago
HanGuo97 / lq-lora
☆127Updated last year
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆45Updated 10 months ago
IST-DASLab / RoSA
Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)
☆44Updated last year
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆147Updated last year
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆56Updated this week
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year
YuchuanTian / DiJiang
[ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…
☆103Updated last year
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
IST-DASLab / QuEST
Work in progress.
☆74Updated 3 months ago
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆36Updated 2 weeks ago
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆48Updated 2 years ago
wdlctc / mini-s
☆52Updated 11 months ago
VITA-Group / WeLore
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…
☆49Updated 6 months ago
YuchuanTian / RethinkTinyLM
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆123Updated 9 months ago
jeffreysijuntan / lloco
The official repo for "LLoCo: Learning Long Contexts Offline"
☆117Updated last year
thepowerfuldeez / OLMo
My fork os allen AI's OLMo for educational purposes.
☆30Updated 10 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆98Updated 11 months ago
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆156Updated last year
chu-tianxiang / QuIP-for-all
QuIP quantization
☆59Updated last year
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year
FasterDecoding / BitDelta
☆202Updated 10 months ago
minyoungg / LTE
☆69Updated last year
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆115Updated 8 months ago