☆17Jun 11, 2025Updated last year
Alternatives and similar repositories for switchhead
Users that are interested in switchhead are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆39Jun 11, 2025Updated last year
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆32Nov 12, 2024Updated last year
- ☆48Dec 19, 2025Updated 5 months ago
- ☆15Oct 19, 2024Updated last year
- This repository contains the python scripts developed as a part of the work presented in the paper "STAnet: A Spatiotemporal Attention Ne…☆15May 10, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆74Jul 30, 2024Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- Stochastic Cellular Automata epidemic models in Python with 2D simulations☆15Feb 24, 2020Updated 6 years ago
- Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…☆21Sep 10, 2024Updated last year
- [NeurIPS 2024] BLAST: Block Level Adaptive Structured Matrix for Efficient Deep Neural Network Inference☆18Nov 6, 2024Updated last year
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- CLIP-MoE: Mixture of Experts for CLIP☆58Oct 10, 2024Updated last year
- [CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference☆30Mar 14, 2024Updated 2 years ago
- ☆18May 18, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections☆22Oct 15, 2024Updated last year
- ☆10Dec 13, 2022Updated 3 years ago
- Mixture of Attention Heads☆52Oct 10, 2022Updated 3 years ago
- PyTorch implementation of EEGDfus☆29Oct 9, 2025Updated 8 months ago
- ☆93Aug 18, 2024Updated last year
- ☆12Apr 25, 2025Updated last year
- Triton implement of bi-directional (non-causal) linear attention☆76Mar 1, 2026Updated 3 months ago
- Scalable and Stable Parallelization of Nonlinear RNNS☆31Mar 6, 2026Updated 3 months ago
- ☆34Apr 2, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆62May 4, 2024Updated 2 years ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆25Jun 6, 2024Updated 2 years ago
- Official PyTorch implementation of CD-MOE☆12Mar 18, 2026Updated 2 months ago
- The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization☆19Mar 7, 2025Updated last year
- ☆27Mar 26, 2025Updated last year
- ☆13Mar 5, 2023Updated 3 years ago
- ☆16Dec 10, 2022Updated 3 years ago
- Official implementation for the IJCAI'24 paper: SDformer☆33Mar 6, 2025Updated last year
- Triton-based implementation of Sparse Mixture of Experts.☆278Oct 3, 2025Updated 8 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Reference implementation of models from Nyonic Model Factory☆12May 13, 2024Updated 2 years ago
- the implementation of the ASAD_DenseNet☆31Mar 24, 2025Updated last year
- [ICLR'23] Effective Self-supervised Pre-training on Low-compute networks without Distillation☆18Oct 9, 2024Updated last year
- ☆27May 12, 2026Updated last month
- Kinetics: Rethinking Test-Time Scaling Laws☆87Jul 11, 2025Updated 11 months ago
- ☆35Apr 12, 2024Updated 2 years ago
- Clustered Compositional Embeddings☆13Oct 25, 2023Updated 2 years ago