GPU-accelerated LLM Training Simulator
☆17Jun 26, 2025Updated 8 months ago
Alternatives and similar repositories for multiverse
Users that are interested in multiverse are comparing it to the libraries listed below
Sorting:
- ☆16Mar 18, 2025Updated 11 months ago
- An external memory allocator example for PyTorch.☆16Aug 10, 2025Updated 6 months ago
- ☆21Apr 2, 2023Updated 2 years ago
- TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches☆80Jul 25, 2023Updated 2 years ago
- ☆47Dec 13, 2024Updated last year
- ☆21May 13, 2022Updated 3 years ago
- ☆22Nov 3, 2025Updated 3 months ago
- ☆23Apr 28, 2024Updated last year
- ☆20Jun 29, 2022Updated 3 years ago
- TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning☆32Jun 13, 2025Updated 8 months ago
- An end-to-end benchmark suite of multi-modal DNN applications for system-architecture co-design☆22Dec 13, 2024Updated last year
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆24Nov 21, 2024Updated last year
- Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"☆77Oct 15, 2025Updated 4 months ago
- ☆24Jul 7, 2024Updated last year
- Personal Digest of NAS (Under Construction 🛠)☆25Nov 24, 2020Updated 5 years ago
- ☆11Dec 19, 2021Updated 4 years ago
- An Automated Performance Optimization Framework for P4-Programmable SmartNICs☆28Nov 18, 2023Updated 2 years ago
- The prototype for NSDI paper "NetHint: White-Box Networking for Multi-Tenant Data Centers"☆26Feb 2, 2024Updated 2 years ago
- A Hybrid Framework to Build High-performance Adaptive Neural Networks for Kernel Datapath☆28May 15, 2023Updated 2 years ago
- Multi-GPU dynamic scheduler using PGAS style cross-GPU communication☆29Jul 23, 2023Updated 2 years ago
- Venus Collective Communication Library, supported by SII and Infrawaves.☆138Updated this week
- A superoptimizing compiler for packet-processing☆30Jun 16, 2023Updated 2 years ago
- Code released to accompany the ISCA paper: "T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware"☆28Feb 18, 2022Updated 4 years ago
- ☆32Jul 11, 2022Updated 3 years ago
- ☆32Aug 21, 2021Updated 4 years ago
- MATLAB/Octave generator of Hamming ECC coding. Output format is Verilog HDL.☆12Dec 27, 2022Updated 3 years ago
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆35Jan 9, 2023Updated 3 years ago
- ☆12Aug 26, 2022Updated 3 years ago
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- ☆43Mar 31, 2025Updated 11 months ago
- ☆37Apr 15, 2023Updated 2 years ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- ☆12Aug 26, 2016Updated 9 years ago
- ☆10Nov 1, 2021Updated 4 years ago
- netbeacon - monitoring your network capture, NIDS or network analysis process☆19Oct 26, 2013Updated 12 years ago
- Verilog implementation of MC68851 Memory Management Unit☆13Feb 26, 2018Updated 8 years ago
- 收集了一些经典的神经网络论文☆12Aug 11, 2024Updated last year
- Code accompanying the NeurIPS 2019 paper AutoAssist: A Framework to Accelerate Training of Deep Neural Networks.☆14Oct 3, 2022Updated 3 years ago
- Parse data and generate plotting scripts based on plotly.☆11Dec 8, 2025Updated 2 months ago