VITA-Group/READ-ME

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/VITA-Group/READ-ME)

VITA-Group / READ-ME

[NeurIPS2024] "Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design", Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Zhangyang Wang

☆16

Alternatives and similar repositories for READ-ME

Users that are interested in READ-ME are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

DavidFanzz / SCMoE
View on GitHub
☆29May 24, 2024Updated 2 years ago
YJHMITWEB / ExFlow
View on GitHub
Explore Inter-layer Expert Affinity in MoE Model Inference
☆16May 6, 2024Updated 2 years ago
imagination-research / EEP
View on GitHub
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
☆25Nov 11, 2025Updated 8 months ago
SJTU-IPADS / PipeLLM
View on GitHub
☆28Dec 22, 2024Updated last year
GATECH-EIC / ShiftAddViT
View on GitHub
[NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
☆30Dec 6, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
SkyworkAI / MoE-plus-plus
View on GitHub
[ICLR 2025] MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
☆270Oct 16, 2024Updated last year
boringlee24 / socc22-miso
View on GitHub
MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant GPU Clusters
☆21Apr 21, 2023Updated 3 years ago
VITA-Group / LoCoCo
View on GitHub
[ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen
☆17Sep 7, 2024Updated last year
tau-nlp / zero_scrolls
View on GitHub
Running inference on the ZeroSCROLLS benchmark
☆22Apr 18, 2024Updated 2 years ago
SLDGroup / LBP-WHT
View on GitHub
☆13Apr 27, 2024Updated 2 years ago
haizhongzheng / LTE
View on GitHub
☆13Oct 13, 2025Updated 9 months ago
andyjm3 / SLTrain
View on GitHub
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)
☆39Nov 1, 2024Updated last year
hyhuang00 / moe_inference
View on GitHub
Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".
☆19Oct 30, 2024Updated last year
tianyi-lab / RoMA
View on GitHub
Code for "Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs"
☆19Nov 6, 2025Updated 8 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
CCurl / J1
View on GitHub
A Forth J1 emulator in C
☆13Nov 25, 2025Updated 7 months ago
UNITES-Lab / Occult
View on GitHub
[ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…
☆13Apr 17, 2025Updated last year
Lucky-Lance / Expert_Sparsity
View on GitHub
[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
☆123May 24, 2024Updated 2 years ago
sayakpaul / Denoised-Smoothing-TF
View on GitHub
Minimal implementation of Denoised Smoothing (https://arxiv.org/abs/2003.01908) in TensorFlow.
☆20Aug 4, 2021Updated 4 years ago
Xingrun-Xing2 / EfficientLLM
View on GitHub
A family of efficient edge language models in 100M~1B sizes.
☆19Feb 14, 2025Updated last year
vossstef / tang_nano_9k_vic20_lcd
View on GitHub
Commodore VIC20 core for the Tang Nano 9K FPGA with LCD Output
☆12Feb 16, 2025Updated last year
wa1tnr / camelforth-rp2040-a
View on GitHub
CamelForth in C for RP2040 Raspberry Pi Pico. A Forth by Dr Brad Rodriguez - ported to RP2040 by wa1tnr - Forth interpreter is on the RP2…
☆12Oct 27, 2021Updated 4 years ago
santhiyaskumar / FPGA_Codec2Encoder
View on GitHub
Hardware Implementation of low-bit rate Codec, Codec2 in Verilog RTL on Cyclone IV FPGA.
☆15Mar 29, 2020Updated 6 years ago
Aaronhuang-778 / Mixture-Compressor-MoE
View on GitHub
[ICLR 2025, IEEE TPAMI 2026] Mixture Compressor & MC#
☆75Feb 12, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
sharonal10 / langint
View on GitHub
☆10Jul 4, 2024Updated 2 years ago
Secbrain / RIDS
View on GitHub
☆11Oct 7, 2023Updated 2 years ago
f8-arch / fpga-board-tutorials
View on GitHub
Tutorials for getting started with an f8 softcore on an FPGA board
☆13Aug 17, 2025Updated 11 months ago
Summer-Summer / Kitty
View on GitHub
Algorithm-System Co-design: accurate and efficient 2-bit KV cache quantization for LLM Inference.
☆17May 20, 2026Updated 2 months ago
Shwai-He / MEO
View on GitHub
The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":
☆47Feb 28, 2026Updated 4 months ago
Yifanfanfanfan / Reverse-Engineering-of-Imperceptible-Adversarial-Image-Perturbations
View on GitHub
☆11Mar 31, 2022Updated 4 years ago
JeiKeiLim / simple_distribute_job
View on GitHub
Simple distribute job scheduler for multiple servers with only SSH. No additions.
☆10Dec 8, 2022Updated 3 years ago
HeegyuKim / korouge
View on GitHub
Google 공식 Rouge Implementation을 한국어에서 사용할 수 있도록 처리
☆17Jan 3, 2024Updated 2 years ago
shauli-ravfogel / adv-kernel-removal
View on GitHub
☆12Oct 23, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
PersoSim / de.persosim.simulator
View on GitHub
PersoSim - the open source eID simulator
☆16May 21, 2026Updated 2 months ago
uiuctml / GOAT
View on GitHub
[JMLR] Gradual Domain Adaptation: Theory and Algorithms
☆11Jan 14, 2025Updated last year
s-ball-10 / jailbreak_dynamics
View on GitHub
☆25Jun 13, 2024Updated 2 years ago
VITA-Group / SteinDreamer
View on GitHub
“SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity” by Peihao Wang, Zhiwen Fan, Dejia Xu, Dilin Wang,…
☆35Jan 5, 2024Updated 2 years ago
XuandongZhao / Ginsew
View on GitHub
[ICML 2023] Protecting Language Generation Models via Invisible Watermarking
☆13Sep 8, 2023Updated 2 years ago
Ther-nullptr / circult-eda-mlsys-tinyml-arxiv-daily
View on GitHub
🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)
☆10Updated this week
JL-Cheng / SERE
View on GitHub
[ICLR 2026] SERE: Similarity-Based Expert Re-routing for Efficient Batch Decoding in MoE Models
☆18Feb 4, 2026Updated 5 months ago