☆17Jun 11, 2025Updated 10 months ago
Alternatives and similar repositories for switchhead
Users that are interested in switchhead are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆101Sep 30, 2024Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆39Jun 11, 2025Updated 10 months ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- This repository contains the python scripts developed as a part of the work presented in the paper "STAnet: A Spatiotemporal Attention Ne…☆15May 10, 2023Updated 2 years ago
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆70Jul 30, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…☆21Sep 10, 2024Updated last year
- Implementation for the paper 'Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport' (ICL…☆19Jan 1, 2025Updated last year
- Linear Attention Sequence Parallelism (LASP)☆88Jun 4, 2024Updated last year
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- [EMNLP 2024] TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering☆17Oct 31, 2024Updated last year
- ☆36Jan 7, 2026Updated 3 months ago
- [CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference☆30Mar 14, 2024Updated 2 years ago
- ☆18May 18, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections☆21Oct 15, 2024Updated last year
- Mixture of Attention Heads☆52Oct 10, 2022Updated 3 years ago
- TU Delft Blockchain Engineering course project on scale-out distributed ledger☆13Mar 4, 2018Updated 8 years ago
- ☆91Aug 18, 2024Updated last year
- Papers and Public Datasets for Medical Vision-Language Learning☆19Apr 27, 2023Updated 2 years ago
- Triton implement of bi-directional (non-causal) linear attention☆74Mar 1, 2026Updated last month
- Scalable and Stable Parallelization of Nonlinear RNNS☆30Mar 6, 2026Updated last month
- Implementation of elastico protocol☆13Jun 12, 2019Updated 6 years ago
- ☆32Apr 2, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆20May 16, 2024Updated last year
- The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization☆18Mar 7, 2025Updated last year
- ☆26Mar 26, 2025Updated last year
- ☆13Mar 5, 2023Updated 3 years ago
- Long CoT Fine-Tuning and Reinforcement Learning for LLMs in the Context of the 24-Point Game: A Toy Project☆25Feb 22, 2025Updated last year
- PIPA (Progressive Intelligent Performance Analytics) is a platform that aggregates a complete toolchain of performance data collection, p…☆26Mar 5, 2026Updated last month
- ☆16Dec 10, 2022Updated 3 years ago
- Triton-based implementation of Sparse Mixture of Experts.☆272Oct 3, 2025Updated 6 months ago
- FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological Sensing [NeurIPS 2024]☆31Aug 12, 2025Updated 8 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Tool to parse wiki tables from the HTML dump of Wikipedia☆11Jun 12, 2022Updated 3 years ago
- Reference implementation of models from Nyonic Model Factory☆12May 13, 2024Updated last year
- the implementation of the ASAD_DenseNet☆31Mar 24, 2025Updated last year
- Learning Situation Hyper-Graphs for Video Question Answering☆22Feb 16, 2024Updated 2 years ago
- ☆27Nov 25, 2025Updated 4 months ago
- Project that regroup the state-of-the-art knowledge distillation approaches for unsupervised anomaly detection☆15Oct 10, 2025Updated 6 months ago
- Kinetics: Rethinking Test-Time Scaling Laws☆87Jul 11, 2025Updated 9 months ago