Inference framework for MoE layers based on TensorRT with Python binding
☆41May 31, 2021Updated 4 years ago
Alternatives and similar repositories for InfMoE
Users that are interested in InfMoE are comparing it to the libraries listed below
Sorting:
- Compiler for Dynamic Neural Networks☆45Nov 13, 2023Updated 2 years ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆20Feb 23, 2024Updated 2 years ago
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- This is the implementation repository of our SOSP'24 paper: Aceso: Achieving Efficient Fault Tolerance in Memory-Disaggregated Key-Value …☆22Oct 20, 2024Updated last year
- A simple MIPS CPU for BUAA CO course (and now NSCSCC).☆10May 15, 2021Updated 4 years ago
- ☆89Apr 2, 2022Updated 3 years ago
- A hand-written recursive decent Verilog parser.☆10Jan 30, 2026Updated last month
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆24Nov 21, 2024Updated last year
- Binary translation in Rust☆13Jun 22, 2020Updated 5 years ago
- Compiling finite generators to digital logic. WIP☆13Aug 24, 2020Updated 5 years ago
- A language and compiler for irregular tensor programs.☆152Nov 29, 2024Updated last year
- ☆84Feb 5, 2026Updated 3 weeks ago
- A fast MoE impl for PyTorch☆1,840Feb 10, 2025Updated last year
- A router IP written in Verilog.☆12Dec 20, 2019Updated 6 years ago
- CSS-LM: Contrastive Semi-supervised Fine-tuning of Pre-trained Language Models☆12Jul 1, 2023Updated 2 years ago
- ☆19Jun 1, 2025Updated 9 months ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆33Nov 29, 2024Updated last year
- Documentation for TCP Lab☆12May 20, 2025Updated 9 months ago
- PKU CompNet'19 Lab 2 - Homebrew TCP☆12Nov 29, 2019Updated 6 years ago
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- [AFK] Hardware router in Chisel (THU Network Joint Lab 2020)☆14Oct 8, 2020Updated 5 years ago
- This is the libMF source files with comments in Chinses.☆29May 25, 2014Updated 11 years ago
- ☆14Mar 26, 2020Updated 5 years ago
- 训练营训练方向项目☆26Jan 28, 2026Updated last month
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆36Aug 29, 2025Updated 6 months ago
- An external memory allocator example for PyTorch.☆16Aug 10, 2025Updated 6 months ago
- ☆17May 10, 2024Updated last year
- PilotFish harvests the free GPU cycles of cloud gaming with deep learning training☆14Jul 2, 2022Updated 3 years ago
- Benchmark PyTorch Custom Operators☆14Jul 6, 2023Updated 2 years ago
- ☆38Oct 11, 2025Updated 4 months ago
- This project can easily test the ncnn model and even deploy ncnn projects on python to speed up☆11Jul 27, 2019Updated 6 years ago
- ☆14Jan 12, 2022Updated 4 years ago
- Reading seminar in Harvard Cloud Networking and Systems Group☆16Aug 29, 2022Updated 3 years ago
- Yet another Polyhedra Compiler for DeepLearning☆19Apr 14, 2023Updated 2 years ago
- Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling☆16Sep 27, 2023Updated 2 years ago
- ☆16Apr 22, 2025Updated 10 months ago
- AI model training on heterogeneous, geo-distributed resources☆38Nov 24, 2025Updated 3 months ago
- Code for CPM-2 Pre-Train☆158Mar 18, 2023Updated 2 years ago
- Explore Inter-layer Expert Affinity in MoE Model Inference☆16May 6, 2024Updated last year