Explore Inter-layer Expert Affinity in MoE Model Inference
☆16May 6, 2024Updated last year
Alternatives and similar repositories for ExFlow
Users that are interested in ExFlow are comparing it to the libraries listed below
Sorting:
- Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".☆19Oct 30, 2024Updated last year
- ☆58May 4, 2024Updated last year
- Code release for AdapMoE accepted by ICCAD 2024☆35Apr 28, 2025Updated 10 months ago
- This repository presents the source code for the paper "MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Qu…☆23Apr 2, 2025Updated 10 months ago
- ☆20Feb 10, 2025Updated last year
- ASIC simulation of Multi-ported Memory Module. And it can offer SRAM-based dual-port basic building block to support multiple read/write …☆22May 30, 2016Updated 9 years ago
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs☆23Nov 11, 2025Updated 3 months ago
- ☆29May 24, 2024Updated last year
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆260Nov 18, 2024Updated last year
- Pure Java Llama2 inference with optional multi-GPU CUDA implementation☆13Sep 2, 2023Updated 2 years ago
- PyTorch library for cost-effective, fast and easy serving of MoE models.☆284Updated this week
- ☆10Mar 8, 2025Updated 11 months ago
- ☆10Jul 6, 2021Updated 4 years ago
- Curated collection of papers in MoE model inference☆343Oct 20, 2025Updated 4 months ago
- ☆89Apr 2, 2022Updated 3 years ago
- ☆57Nov 29, 2025Updated 3 months ago
- ☆16Aug 9, 2025Updated 6 months ago
- The code based on vLLM for the paper “ Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention”.☆11Sep 19, 2024Updated last year
- A standalone CXL-enabled system simulator.☆18Jan 10, 2026Updated last month
- ☆15Nov 11, 2024Updated last year
- MoE-Visualizer is a tool designed to visualize the selection of experts in Mixture-of-Experts (MoE) models.☆16Apr 8, 2025Updated 10 months ago
- ☆12Apr 6, 2025Updated 10 months ago
- Course Project for High Level Chip Design (高层次芯片设计)☆17Jan 2, 2025Updated last year
- KAF : Kolmogorov-Arnold Fourier Networks☆20Feb 19, 2025Updated last year
- ☆12Updated this week
- Chameleon: A MatMul-Free TCN Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data☆25Jun 6, 2025Updated 8 months ago
- Integrating Event-based Dynamic Vision Sensors with Sparse Hyperdimensional Computing☆12Jul 9, 2020Updated 5 years ago
- ☆12Aug 18, 2023Updated 2 years ago
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- ☆301Jul 10, 2025Updated 7 months ago
- Compiler for Dynamic Neural Networks☆45Nov 13, 2023Updated 2 years ago
- Inference framework for MoE layers based on TensorRT with Python binding☆41May 31, 2021Updated 4 years ago
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models☆49Nov 5, 2024Updated last year
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆56May 29, 2024Updated last year
- linux 内核技术文档☆16Jan 12, 2026Updated last month
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆15Feb 4, 2025Updated last year
- Battleship environment for reinforcement learning tasks☆14Apr 29, 2023Updated 2 years ago
- [ECCV 2024] CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs☆18Jul 2, 2024Updated last year
- ☆11Sep 30, 2023Updated 2 years ago