ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.
☆27Jul 6, 2023Updated 2 years ago
Alternatives and similar repositories for ComScribe
Users that are interested in ComScribe are comparing it to the libraries listed below
Sorting:
- FTPipe and related pipeline model parallelism research.☆44May 16, 2023Updated 2 years ago
- Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involveme…☆22Apr 25, 2024Updated last year
- ☆22Nov 7, 2018Updated 7 years ago
- A curated list of amazingly awesome database libraries and resources.☆15Dec 14, 2023Updated 2 years ago
- [ACM EuroSys 2023] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access☆56Aug 6, 2025Updated 6 months ago
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications☆127May 9, 2022Updated 3 years ago
- ☆15Apr 20, 2022Updated 3 years ago
- ☆19May 10, 2025Updated 9 months ago
- ☆17Dec 9, 2022Updated 3 years ago
- Implementation of vDNN++; an improvement over vDNN☆18Dec 7, 2018Updated 7 years ago
- My notes on various HPC papers.☆26Jan 7, 2023Updated 3 years ago
- An experimental parallel training platform☆56Mar 25, 2024Updated last year
- ☆26Aug 31, 2023Updated 2 years ago
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021☆55Jul 21, 2021Updated 4 years ago
- ☆25Apr 3, 2023Updated 2 years ago
- ☆27May 31, 2023Updated 2 years ago
- Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite☆69Sep 12, 2018Updated 7 years ago
- nnScaler: Compiling DNN models for Parallel Training☆124Sep 23, 2025Updated 5 months ago
- Multi-GPU communication profiler and visualizer☆38Jun 10, 2024Updated last year
- Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.☆70Mar 20, 2025Updated 11 months ago
- This repository contains the dataset of our ISSTA 2018 paper: An Empirical Study on TensorFlow Program Bugs.☆29May 20, 2020Updated 5 years ago
- ☆78May 4, 2021Updated 4 years ago
- Cloud native connectivity for Unreal Engine☆10Apr 14, 2023Updated 2 years ago
- PetPS: Supporting Huge Embedding Models with Tiered Memory☆33May 21, 2024Updated last year
- Dirigent: Lightweight Serverless Orchestration☆41Aug 26, 2025Updated 6 months ago
- ☆84Dec 2, 2022Updated 3 years ago
- ddl-benchmarks: Benchmarks for Distributed Deep Learning☆36May 29, 2020Updated 5 years ago
- An extension of rCUDA that enables remote-to-local GPU migration☆40Sep 28, 2016Updated 9 years ago
- A simple script to plot the Roofline model for given HW platforms and applications☆10Aug 22, 2024Updated last year
- ☆23Jan 27, 2014Updated 12 years ago
- Continuous Pipelined Speculative Decoding☆16Jan 4, 2026Updated last month
- Notes and Examples to get started Parallel Computing with CUDA.☆13Nov 1, 2019Updated 6 years ago
- Sackerel is integration tool of Mackerel and SakuraCloud☆10Feb 12, 2018Updated 8 years ago
- zkSnark circuit compiler☆12Feb 19, 2026Updated last week
- DELTA-pytorch:DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation☆12Apr 16, 2024Updated last year
- Digital SuperTwin: digital twin of supercomputers☆13Nov 24, 2024Updated last year
- ☆41Apr 25, 2024Updated last year
- Implementation and analysis of five different GPU based SPMV algorithms in CUDA☆40Feb 5, 2019Updated 7 years ago
- ☆38Jan 15, 2021Updated 5 years ago