PyTorch implementation of "From Sparse to Soft Mixtures of Experts"
☆70Aug 22, 2023Updated 2 years ago
Alternatives and similar repositories for soft-moe
Users that are interested in soft-moe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆83Oct 5, 2023Updated 2 years ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆345Apr 2, 2025Updated last year
- ☆717Dec 6, 2025Updated 5 months ago
- ☆22Oct 22, 2025Updated 6 months ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆28Jul 11, 2024Updated last year
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 10 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆159Jul 9, 2025Updated 9 months ago
- ☆19Nov 5, 2024Updated last year
- ☆95Apr 3, 2023Updated 3 years ago
- GoldFinch and other hybrid transformer components☆46Jul 20, 2024Updated last year
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆383Jun 17, 2024Updated last year
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- Code for the paper "Interpreting and Improving Diffusion Models from an Optimization Perspective", appearing in ICML 2024☆14Sep 30, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆30Sep 28, 2023Updated 2 years ago
- The official repository for the experiments included in the paper titled "Patch-level Routing in Mixture-of-Experts is Provably Sample-ef…☆14Feb 12, 2026Updated 2 months ago
- An unofficial implementation for paper "DenseCLIP: Extract Free Dense Labels from CLIP"☆24Jan 27, 2022Updated 4 years ago
- 队伍在2023年全国大学生数学建模竞赛中选择的C题目编程过程中使用的代码,现在开源提供给大家!☆11Jan 15, 2024Updated 2 years ago
- ☆23Mar 17, 2026Updated last month
- [ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆40Apr 21, 2026Updated 2 weeks ago
- [NeurIPS 2024] Mixture of Experts for Audio-Visual Learning☆24Jan 19, 2025Updated last year
- [NeurIPS'24 Oral] HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning☆238Dec 3, 2024Updated last year
- Self Reproduction Code of Paper "Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (MIT CSAIL)☆17May 24, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Related papers about Referring Image Segmentation (RIS)☆16Dec 26, 2023Updated 2 years ago
- [CVPR 2025] CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answeri…☆55Jun 16, 2025Updated 10 months ago
- Building language models to predict more than one token ahead to enable further ahead predictions.☆12May 22, 2025Updated 11 months ago
- SAM4SS: Tailoring SAM and SAM2 for Semantic Segmentation☆11Jul 31, 2024Updated last year
- [IEEE TIP 2025] This repo is the official implementation of "STPNet: Scale-aware Text Prompt Network for Medical Image Segmentation"☆26Jul 17, 2025Updated 9 months ago
- Pseudo-Bag Mixup Augmentation for Multiple Instance Learning-Based Whole Slide Image Classification (IEEE TMI 2024)☆68Mar 17, 2025Updated last year
- [ACL 2025] Analyzing LLMs' Multilingual Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations☆19Oct 18, 2025Updated 6 months ago
- Recent Advances in Vision-Language Pre-training!☆32Jan 10, 2022Updated 4 years ago
- Streaming Thinking for VideoLLM Streaming Video Understanding☆97Mar 30, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ECCV 2024] Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models☆56Jul 9, 2024Updated last year
- Official PyTorch implementation of our ICCV2023 paper “When Prompt-based Incremental Learning Does Not Meet Strong Pretraining”☆16Jan 8, 2024Updated 2 years ago
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- Official Repository for ICML 2024 Paper "OT-CLIP: Understanding and Generalizing CLIP via Optimal Transport"☆24Dec 4, 2025Updated 5 months ago
- [WACV 2025, Best Student Paper, Oral] GeoDiffuser: Geometry-Based Image Editing with Diffusion Models☆22Mar 22, 2025Updated last year
- The official implementation of Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion [AAAI'2…☆16Feb 2, 2026Updated 3 months ago
- ☆21Apr 16, 2024Updated 2 years ago