Code repo for efficient quantized MoE inference with mixture of low-rank compensators
☆34Apr 14, 2025Updated 11 months ago
Alternatives and similar repositories for MiLo
Users that are interested in MiLo are comparing it to the libraries listed below
Sorting:
- Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"☆30Jun 30, 2025Updated 8 months ago
- Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".☆19Oct 30, 2024Updated last year
- Repo for EmbedLLM: Learning Compact Representations of Large Language Models☆29Sep 25, 2025Updated 5 months ago
- This is the implementation for paper: AdaTune: Adaptive Tensor Program CompilationMade Efficient (NeurIPS 2020).☆14May 16, 2021Updated 4 years ago
- Continuous Pipelined Speculative Decoding☆18Jan 4, 2026Updated 2 months ago
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆53Aug 6, 2025Updated 7 months ago
- A minimum demo for PyTorch distributed extension functionality for collectives.☆15Jul 29, 2024Updated last year
- virtual node analysis on ogb benchmark dataset☆14Mar 9, 2023Updated 3 years ago
- Sparse-dense matrix-matrix multiplication on GPUs☆14Oct 15, 2018Updated 7 years ago
- ☆33Oct 13, 2025Updated 5 months ago
- Cosmic Tagging Network for Neutrino Physics☆13Jun 26, 2024Updated last year
- A Micro-benchmarking Tool for HPC Networks☆34Sep 2, 2025Updated 6 months ago
- Code for "Adaptive Self-improvement LLM Agentic System for ML Library Development" (ICML 2025)☆15Jan 6, 2026Updated 2 months ago
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling☆54Jul 15, 2025Updated 8 months ago
- Code repository for "Prioritizing Repurposable Drugs for SARS-CoV-2 using Deep Learning and Population-based Validation"☆16May 2, 2025Updated 10 months ago
- This repository contains the SwinV2_Weather model, developed for the "Analyzing and Exploring Training Recipes for Large-Scale Transforme…☆20Oct 8, 2024Updated last year
- ☆26Sep 3, 2020Updated 5 years ago
- Standardized higher-order datasets with corresponding datasheets☆19Aug 17, 2025Updated 7 months ago
- Cheetah is a system that optimizes queries using programmable switches.☆20Jun 25, 2020Updated 5 years ago
- [COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"☆72Jul 8, 2025Updated 8 months ago
- The open-source Mixture of Depths code and the official implementation of the paper "Router-Tuning: A Simple and Effective Approach for E…☆29Feb 28, 2026Updated 2 weeks ago
- Principles and Methodologies for Serial Performance Optimization (OSDI' 25)☆27Jun 5, 2025Updated 9 months ago
- Implementation of the Snappy compression algorithm as a RoCC accelerator☆12Jul 29, 2019Updated 6 years ago
- Expressive, Easy to Build, and High-Performance Application Networks☆19Jul 1, 2025Updated 8 months ago
- 学习并复现经典的推荐系统多目标任务,如:SharedBottom、ESMM、MMoE、PLE☆41Jul 30, 2022Updated 3 years ago
- ☆15Aug 19, 2024Updated last year
- ☆28May 24, 2025Updated 9 months ago
- A minimal Diffusion Model build using only linear components☆28Sep 27, 2023Updated 2 years ago
- ☆10Oct 8, 2021Updated 4 years ago
- Example applications for the Department of Energy Computational Science Graduate Fellowship☆20Sep 11, 2025Updated 6 months ago
- This Python script performs a Model Predictive Control (MPC) simulation for vehicle lateral control using the CasADi framework. The main …☆19Feb 18, 2025Updated last year
- ☆14Sep 8, 2019Updated 6 years ago
- Network on chip based neural network accelerator☆10Mar 25, 2021Updated 4 years ago
- LLM-DSE: Searching Accelerator Parameters with LLM Agents☆13May 22, 2025Updated 9 months ago
- A simple cycle accurate template model for ASIC/FPGA hardware design. Including a cycle accurate FIFO design example. More designs are co…☆17Sep 5, 2019Updated 6 years ago
- ☆15Mar 24, 2023Updated 2 years ago
- ☆10Dec 28, 2020Updated 5 years ago
- ALCF Systems User Documentation☆29Mar 14, 2026Updated last week
- EfficientNet-L2 weights in Keras and retrieval script modified from qubvel/efficientnet☆24Mar 25, 2021Updated 4 years ago