[ICML 2025 Oral] Mixture of Lookup Experts
☆73Dec 3, 2025Updated 3 months ago
Alternatives and similar repositories for MoLE
Users that are interested in MoLE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model☆13Feb 11, 2025Updated last year
- [ICLR 2025] Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization☆25Oct 5, 2025Updated 5 months ago
- MoE-Visualizer is a tool designed to visualize the selection of experts in Mixture-of-Experts (MoE) models.☆16Apr 8, 2025Updated 11 months ago
- The code of 《M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis》☆14Mar 31, 2025Updated 11 months ago
- ☆22Dec 11, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆109Dec 20, 2024Updated last year
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 8 months ago
- Code and resources for the NeurIPS 2025 Paper "BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset" by Zhiheng X…☆19Oct 14, 2025Updated 5 months ago
- CRAI is a multimodal large language model based on the Mixture of Experts (MoE) architecture, supporting text and image cross-modal tasks…☆16Apr 29, 2025Updated 10 months ago
- LCA-on-the-line (ICML 2024 Oral)☆13Feb 13, 2025Updated last year
- ☆46Sep 27, 2025Updated 5 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated last year
- [COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆20Apr 9, 2025Updated 11 months ago
- ☆133Jun 6, 2025Updated 9 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ECCV 2022] Contrastive Prototypical Network with Wasserstein Confidence Penalty☆11Oct 20, 2022Updated 3 years ago
- Data and code required to reach the main conclusions of the fastsmcg paper☆10Sep 19, 2023Updated 2 years ago
- ☆29Mar 13, 2026Updated 2 weeks ago
- [ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models☆36Nov 4, 2025Updated 4 months ago
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆69Jul 30, 2024Updated last year
- Official implementation of "Mixture of Experts Meets Prompt-Based Continual Learning" (NeurIPS 2024)☆44Aug 1, 2025Updated 7 months ago
- [ICML 2025 Spotlight] RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding☆20Mar 2, 2025Updated last year
- Home page for Microsoft Phi-Ground tech-report☆23Sep 8, 2025Updated 6 months ago
- Extended implementation of RoboDexVLM (IROS 2025)☆38Nov 13, 2025Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Mixture-of-Experts Multimodal Variational Autoencoder☆15Jul 3, 2025Updated 8 months ago
- ☆13Mar 28, 2025Updated 11 months ago
- KV cache compression via sparse coding☆17Oct 26, 2025Updated 5 months ago
- Research work aimed at addressing the problem of modeling infinite-length context☆48Dec 18, 2025Updated 3 months ago
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆135Apr 12, 2025Updated 11 months ago
- ☆13May 15, 2025Updated 10 months ago
- ☆18Oct 26, 2024Updated last year
- Official code for "Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping" (ICLR 2025)☆29Oct 25, 2025Updated 5 months ago
- Source code for SWIFT, an efficient reward model.☆19Jan 13, 2026Updated 2 months ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- ☆136May 29, 2025Updated 9 months ago
- ☆11Sep 20, 2024Updated last year
- Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…☆21Sep 10, 2024Updated last year
- Prototyp MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism☆27Apr 4, 2025Updated 11 months ago
- Official Implementation of FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration☆30Nov 22, 2025Updated 4 months ago
- Official repository for CVPR 2024 paper "Advancing Saliency Ranking with Human Fixations: Dataset, Models and Benchmarks".☆20Jun 21, 2024Updated last year
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆45Feb 13, 2025Updated last year