[ICML 2025 Oral] Mixture of Lookup Experts
☆72Dec 3, 2025Updated 5 months ago
Alternatives and similar repositories for MoLE
Users that are interested in MoLE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model☆13Feb 11, 2025Updated last year
- [ICLR 2025] Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization☆24Oct 5, 2025Updated 7 months ago
- MoE-Visualizer is a tool designed to visualize the selection of experts in Mixture-of-Experts (MoE) models.☆16Apr 8, 2025Updated last year
- [NeurIPS 2024] Official Implementation of "SDformer: Similarity-driven Discrete Transformer For Time Series Generation"☆15May 23, 2025Updated 11 months ago
- [AAAI 2025] Official Implementation of "HDT: Hierarchical Discrete Transformer for Multivariate Time Series Forecasting"☆18Feb 17, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆113Dec 20, 2024Updated last year
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 9 months ago
- ☆22Oct 22, 2025Updated 6 months ago
- [EMNLP'23] Code for 'Rethinking Negative Pairs in Code Search'☆14Oct 17, 2023Updated 2 years ago
- Code and resources for the NeurIPS 2025 Paper "BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset" by Zhiheng X…☆19Oct 14, 2025Updated 6 months ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆61Feb 7, 2025Updated last year
- 使用torch.distributed实现DP/TP/PP☆13Dec 28, 2023Updated 2 years ago
- CRAI is a multimodal large language model based on the Mixture of Experts (MoE) architecture, supporting text and image cross-modal tasks…☆16Apr 29, 2025Updated last year
- LCA-on-the-line (ICML 2024 Oral)☆14Feb 13, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICLR 2025] Official Implementation of "TS-LIF: A Temporal Segment Spiking Neuron Network for Time Series Forecasting"☆25Mar 10, 2025Updated last year
- ☆46Sep 27, 2025Updated 7 months ago
- Scaling Laws for Mixture of Experts Models☆15Feb 25, 2025Updated last year
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated last year
- ☆135Jun 6, 2025Updated 11 months ago
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆25Oct 13, 2025Updated 6 months ago
- [ECCV 2022] Contrastive Prototypical Network with Wasserstein Confidence Penalty☆11Oct 20, 2022Updated 3 years ago
- [EMNLP'22] Code for 'Exploring Representation-level Augmentation for Code Search'☆27Oct 9, 2023Updated 2 years ago
- [ACL'24 Oral] Code for 'Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search'☆18Sep 30, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Data and code required to reach the main conclusions of the fastsmcg paper☆10Sep 19, 2023Updated 2 years ago
- ☆29Mar 13, 2026Updated last month
- [CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Model☆25Oct 12, 2024Updated last year
- [ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models☆36Nov 4, 2025Updated 6 months ago
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆71Jul 30, 2024Updated last year
- Official implementation of "Mixture of Experts Meets Prompt-Based Continual Learning" (NeurIPS 2024)☆45Aug 1, 2025Updated 9 months ago
- ☆25Apr 13, 2025Updated last year
- Blog related files.☆14Jan 8, 2026Updated 3 months ago
- ☆11Dec 24, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Mixture-of-Experts Multimodal Variational Autoencoder☆15Jul 3, 2025Updated 10 months ago
- Extended implementation of RoboDexVLM (IROS 2025)☆39Nov 13, 2025Updated 5 months ago
- ☆13Mar 28, 2025Updated last year
- KV cache compression via sparse coding☆17Oct 26, 2025Updated 6 months ago
- The code for "MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking"☆19Jan 25, 2025Updated last year
- This is the implementation of our CVPR'23 paper "Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition".☆22Dec 16, 2023Updated 2 years ago
- Research work aimed at addressing the problem of modeling infinite-length context☆48Dec 18, 2025Updated 4 months ago