RobertCsordas / moe_attentionLinks
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆97Updated 8 months ago
Alternatives and similar repositories for moe_attention
Users that are interested in moe_attention are comparing it to the libraries listed below
Sorting:
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆37Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆155Updated last month
- ☆79Updated 9 months ago
- A repository for research on medium sized language models.☆76Updated last year
- Work in progress.☆67Updated this week
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆48Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆143Updated 8 months ago
- My fork os allen AI's OLMo for educational purposes.☆30Updated 5 months ago
- ☆50Updated 7 months ago
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆42Updated last year
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆92Updated last week
- This repo is based on https://github.com/jiaweizzhao/GaLore☆28Updated 8 months ago
- PB-LLM: Partially Binarized Large Language Models☆152Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆124Updated 9 months ago
- ☆125Updated last year
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆47Updated last month
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆35Updated 11 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆160Updated 11 months ago
- Implementation of Infini-Transformer in Pytorch☆111Updated 5 months ago
- ☆79Updated 4 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆63Updated last year
- Griffin MQA + Hawk Linear RNN Hybrid☆86Updated last year
- ☆103Updated last year
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆221Updated last month
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆103Updated 2 years ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 9 months ago
- ☆92Updated 8 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- DPO, but faster 🚀☆42Updated 5 months ago
- [ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…☆101Updated 11 months ago