PyTorch implementation of "From Sparse to Soft Mixtures of Experts"
☆68Aug 22, 2023Updated 2 years ago
Alternatives and similar repositories for soft-moe
Users that are interested in soft-moe are comparing it to the libraries listed below
Sorting:
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated 10 months ago
- ☆19Nov 5, 2024Updated last year
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- The official implementation of Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion [AAAI'2…☆15Feb 2, 2026Updated last month
- ☆11Mar 13, 2023Updated 2 years ago
- ☆19Updated this week
- GoldFinch and other hybrid transformer components☆45Jul 20, 2024Updated last year
- Official Implementation of Video-MA2MBA☆12Dec 3, 2024Updated last year
- ⚓️ Interactive playground for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.☆18Dec 20, 2025Updated 2 months ago
- Code for the paper "Interpreting and Improving Diffusion Models from an Optimization Perspective", appearing in ICML 2024☆14Sep 30, 2024Updated last year
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning☆13Sep 2, 2024Updated last year
- This is the implementation of the paper "Sparse Point Cloud Patches Rendering via Splitting 2D Gaussians"(CVPR 2025).☆18May 15, 2025Updated 9 months ago
- [ACL 2025] Analyzing LLMs' Multilingual Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations☆18Oct 18, 2025Updated 4 months ago
- ☆16May 13, 2025Updated 9 months ago
- An unofficial implementation for paper "DenseCLIP: Extract Free Dense Labels from CLIP"☆23Jan 27, 2022Updated 4 years ago
- Building language models to predict more than one token ahead to enable further ahead predictions.☆12May 22, 2025Updated 9 months ago
- The official repository for the experiments included in the paper titled "Patch-level Routing in Mixture-of-Experts is Provably Sample-ef…☆14Feb 12, 2026Updated 2 weeks ago
- ☆30Sep 28, 2023Updated 2 years ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- [ECCV 2024] Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models☆56Jul 9, 2024Updated last year
- [CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"☆36Apr 21, 2024Updated last year
- sigma-MoE layer☆21Jan 5, 2024Updated 2 years ago
- Official PyTorch implementation of our ICCV2023 paper “When Prompt-based Incremental Learning Does Not Meet Strong Pretraining”☆16Jan 8, 2024Updated 2 years ago
- ☆17Jun 20, 2024Updated last year
- [NeurIPS 2024] Mixture of Experts for Audio-Visual Learning☆23Jan 19, 2025Updated last year
- Official code repository of Shuffle-R1☆25Feb 23, 2026Updated last week
- official implementation of "CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusi…☆18Sep 5, 2024Updated last year
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Jul 1, 2025Updated 8 months ago
- Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""☆20Jun 13, 2025Updated 8 months ago
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆17Mar 31, 2025Updated 11 months ago
- ICML 2024 Paper "Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies"☆17Jul 10, 2024Updated last year
- Official implementation of the WACV 2025 paper "3D Part Segmentation via Geometric Aggregation of 2D Visual Features"☆25Jun 8, 2025Updated 8 months ago
- Official Repository for ICML 2024 Paper "OT-CLIP: Understanding and Generalizing CLIP via Optimal Transport"☆23Dec 4, 2025Updated 3 months ago
- [NeurIPS 2025] Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO☆79Oct 29, 2025Updated 4 months ago
- Self Reproduction Code of Paper "Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (MIT CSAIL)☆17May 24, 2024Updated last year
- [NeurIPS 2024 Oral] RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation☆19Dec 22, 2024Updated last year
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆154Jul 9, 2025Updated 7 months ago
- Related papers about Referring Image Segmentation (RIS)☆16Dec 26, 2023Updated 2 years ago