☆17Jun 11, 2025Updated 9 months ago
Alternatives and similar repositories for switchhead
Users that are interested in switchhead are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- sigma-MoE layer☆21Jan 5, 2024Updated 2 years ago
- ☆11Sep 7, 2024Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆39Jun 11, 2025Updated 9 months ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- ☆11Sep 20, 2024Updated last year
- This repository contains the python scripts developed as a part of the work presented in the paper "Low-latency auditory spatial attentio…☆10Sep 15, 2021Updated 4 years ago
- This repository contains the python scripts developed as a part of the work presented in the paper "STAnet: A Spatiotemporal Attention Ne…☆15May 10, 2023Updated 2 years ago
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆69Jul 30, 2024Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…☆21Sep 10, 2024Updated last year
- Implementation for the paper 'Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport' (ICL…☆17Jan 1, 2025Updated last year
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- [CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference☆30Mar 14, 2024Updated 2 years ago
- ☆18May 18, 2023Updated 2 years ago
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections☆21Oct 15, 2024Updated last year
- Mixture of Attention Heads☆52Oct 10, 2022Updated 3 years ago
- PyTorch implementation of EEGDfus☆21Oct 9, 2025Updated 5 months ago
- A comprehensive list of papers about Large-Language-Diffusion-Models.☆62Mar 2, 2026Updated 3 weeks ago
- ☆91Aug 18, 2024Updated last year
- Triton implement of bi-directional (non-causal) linear attention☆71Mar 1, 2026Updated 3 weeks ago
- ☆12Apr 25, 2025Updated 10 months ago
- Scalable and Stable Parallelization of Nonlinear RNNS☆29Mar 6, 2026Updated 2 weeks ago
- ☆32Apr 2, 2025Updated 11 months ago
- ☆20May 16, 2024Updated last year
- ☆26Mar 26, 2025Updated 11 months ago
- FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological Sensing [NeurIPS 2024]☆30Aug 12, 2025Updated 7 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆270Oct 3, 2025Updated 5 months ago
- Official implementation for the IJCAI'24 paper: SDformer☆31Mar 6, 2025Updated last year
- Repository in Support of EAGLE Submission☆22Oct 11, 2025Updated 5 months ago
- An implementation of SEAL: Safety-Enhanced Aligned LLM fine-tuning via bilevel data selection.☆24Feb 20, 2025Updated last year
- Tool to parse wiki tables from the HTML dump of Wikipedia☆11Jun 12, 2022Updated 3 years ago
- Reference implementation of models from Nyonic Model Factory☆12May 13, 2024Updated last year
- the implementation of the ASAD_DenseNet☆30Mar 24, 2025Updated last year
- [ICLR'23] Effective Self-supervised Pre-training on Low-compute networks without Distillation☆18Oct 9, 2024Updated last year
- ☆27Nov 25, 2025Updated 3 months ago
- Project that regroup the state-of-the-art knowledge distillation approaches for unsupervised anomaly detection☆14Oct 10, 2025Updated 5 months ago
- ☆35Apr 12, 2024Updated last year
- Clustered Compositional Embeddings☆11Oct 25, 2023Updated 2 years ago
- Residual vector quantization for KV cache compression in large language model☆12Oct 22, 2024Updated last year