[NeurIPS2024] "Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design", Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Zhangyang Wang
☆16Dec 16, 2024Updated last year
Alternatives and similar repositories for READ-ME
Users that are interested in READ-ME are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆29May 24, 2024Updated 2 years ago
- Code release for AdapMoE accepted by ICCAD 2024☆39Apr 28, 2025Updated last year
- Explore Inter-layer Expert Affinity in MoE Model Inference☆16May 6, 2024Updated 2 years ago
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs☆25Nov 11, 2025Updated 7 months ago
- based on Differential Cryptanalysis of the Full 16-round DES(Eli Biham / Adi Shamir), all comments and report are written in Korean.☆10Dec 19, 2022Updated 3 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- [ICLR 2025] DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models☆19Mar 25, 2025Updated last year
- [NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer☆30Dec 6, 2023Updated 2 years ago
- [ICLR 2025] MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts☆269Oct 16, 2024Updated last year
- MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant GPU Clusters☆21Apr 21, 2023Updated 3 years ago
- [ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen☆17Sep 7, 2024Updated last year
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆162Jul 9, 2025Updated 11 months ago
- [NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training☆32May 2, 2025Updated last year
- Running inference on the ZeroSCROLLS benchmark☆22Apr 18, 2024Updated 2 years ago
- BERT-based GEC tagging for Japanese☆19Aug 4, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆16Apr 21, 2025Updated last year
- ☆13Oct 13, 2025Updated 8 months ago
- GPU operators for sparse tensor operations☆37Mar 11, 2024Updated 2 years ago
- Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".☆19Oct 30, 2024Updated last year
- ☆13Apr 27, 2024Updated 2 years ago
- [ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…☆13Apr 17, 2025Updated last year
- An OpenAI API compatible images server to generate or manipulate images.☆18Feb 2, 2025Updated last year
- 蝦米輸入法☆27Jun 6, 2026Updated 3 weeks ago
- [NeurIPS 2022] "Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets" by Ruisi Cai*, Zhenyu Zh…☆21Oct 1, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- "rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Yandex Files☆11Oct 27, 2024Updated last year
- [ICLR 2025] Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron☆33Apr 30, 2025Updated last year
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆120May 24, 2024Updated 2 years ago
- 模型加速/模型压缩(已完成所有Lab)☆11Dec 24, 2023Updated 2 years ago
- Utility programs to pipe data across a RDMA-capable network☆19Mar 14, 2026Updated 3 months ago
- A family of efficient edge language models in 100M~1B sizes.☆19Feb 14, 2025Updated last year
- ☆14Mar 17, 2023Updated 3 years ago
- [ICML 2021] "Efficient Lottery Ticket Finding: Less Data is More" by Zhenyu Zhang*, Xuxi Chen*, Tianlong Chen*, Zhangyang Wang☆26Dec 30, 2021Updated 4 years ago
- Minimal implementation of Denoised Smoothing (https://arxiv.org/abs/2003.01908) in TensorFlow.☆20Aug 4, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆18Jul 21, 2024Updated last year
- Google 공식 Rouge Implementation을 한국어에서 사용할 수 있도록 처리☆17Jan 3, 2024Updated 2 years ago
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆16Feb 4, 2025Updated last year
- [NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Z…☆125Dec 29, 2021Updated 4 years ago
- JAX implementation of LLaMA, aiming to train LLaMA on Google Cloud TPU☆14Jul 22, 2023Updated 2 years ago
- Collection of words to help with writing high level applications with Forth language.☆12Jun 27, 2021Updated 5 years ago
- Understanding deep networks and large models.☆29Jan 23, 2026Updated 5 months ago