PyTorch implementation of "From Sparse to Soft Mixtures of Experts"
☆69Aug 22, 2023Updated 2 years ago
Alternatives and similar repositories for soft-moe
Users that are interested in soft-moe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆83Oct 5, 2023Updated 2 years ago
- ☆713Dec 6, 2025Updated 3 months ago
- ☆21Oct 22, 2025Updated 5 months ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated 11 months ago
- ☆27Jul 11, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆156Jul 9, 2025Updated 8 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- ☆93Apr 3, 2023Updated 2 years ago
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- GMoE could be the next backbone model for many kinds of generalization task.☆273Mar 21, 2023Updated 3 years ago
- sigma-MoE layer☆21Jan 5, 2024Updated 2 years ago
- ☆30Sep 28, 2023Updated 2 years ago
- The official repository for the experiments included in the paper titled "Patch-level Routing in Mixture-of-Experts is Provably Sample-ef…☆14Feb 12, 2026Updated last month
- ☆18Jun 20, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- PELA: Learning Parameter-Efficient Models with Low-Rank Approximation [CVPR 2024]☆19Apr 14, 2024Updated last year
- An unofficial implementation for paper "DenseCLIP: Extract Free Dense Labels from CLIP"☆24Jan 27, 2022Updated 4 years ago
- [ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆29Mar 16, 2026Updated last week
- 队伍在2023年全国大学生数学建模竞赛中选择的C题目编程过程中使用的代码,现在开源提供给大家!☆11Jan 15, 2024Updated 2 years ago
- ☆20Mar 17, 2026Updated last week
- 西电人工智能学院大二专业基础实践项目--高光谱图像目标检测☆10Jan 15, 2024Updated 2 years ago
- [NeurIPS 2024] Mixture of Experts for Audio-Visual Learning☆24Jan 19, 2025Updated last year
- LLMBind: A Unified Modality-Task Integration Framework☆19Jun 16, 2024Updated last year
- ☆10Mar 18, 2025Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- [NeurIPS'24 Oral] HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning☆233Dec 3, 2024Updated last year
- Self Reproduction Code of Paper "Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (MIT CSAIL)☆17May 24, 2024Updated last year
- [CVPR 2025] CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answeri…☆53Jun 16, 2025Updated 9 months ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆19Jul 20, 2024Updated last year
- ICML 2024 Paper "Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies"☆17Jul 10, 2024Updated last year
- This is a SPM12 batch script that runs a standard fMRI preprocessing pipeline on a BIDS formatted data-set.☆13Nov 19, 2020Updated 5 years ago
- SAM4SS: Tailoring SAM and SAM2 for Semantic Segmentation☆11Jul 31, 2024Updated last year
- Pseudo-Bag Mixup Augmentation for Multiple Instance Learning-Based Whole Slide Image Classification (IEEE TMI 2024)☆67Mar 17, 2025Updated last year
- [ECCV 2024] Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models☆56Jul 9, 2024Updated last year
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- An awesome list that curates the best Flet tools, tutorials, blogs and more.☆10Jan 8, 2023Updated 3 years ago
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- Streaming Thinking for VideoLLM Streaming Video Understanding☆86Mar 13, 2026Updated 2 weeks ago
- Official Repository for ICML 2024 Paper "OT-CLIP: Understanding and Generalizing CLIP via Optimal Transport"☆23Dec 4, 2025Updated 3 months ago
- RadGraph: Extracting Clinical Entities and Relations from Radiology Reports☆13Nov 22, 2022Updated 3 years ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- [WACV 2025, Best Student Paper, Oral] GeoDiffuser: Geometry-Based Image Editing with Diffusion Models☆22Mar 22, 2025Updated last year