Dynamic Tensor Rematerialization prototype (modified PyTorch) and simulator. Paper: https://arxiv.org/abs/2006.09616
☆133Jul 6, 2023Updated 2 years ago
Alternatives and similar repositories for dtr-prototype
Users that are interested in dtr-prototype are comparing it to the libraries listed below
Sorting:
- PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.☆10Feb 10, 2022Updated 4 years ago
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications☆127May 9, 2022Updated 3 years ago
- Re-implementation of the TASO compiler using equality saturation☆138Jun 28, 2021Updated 4 years ago
- Research and development for optimizing transformers☆131Feb 16, 2021Updated 5 years ago
- Fine-grained GPU sharing primitives☆148Jul 28, 2025Updated 7 months ago
- Training neural networks in TensorFlow 2.0 with 5x less memory☆137Feb 21, 2022Updated 4 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Mar 21, 2022Updated 3 years ago
- DELTA-pytorch:DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation☆12Apr 16, 2024Updated last year
- TensorFlow and TVM integration☆36Apr 27, 2020Updated 5 years ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆125Jun 23, 2022Updated 3 years ago
- Term project for TaPL. A mini coq-like proof assistant.☆17Jun 17, 2018Updated 7 years ago
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆17Aug 4, 2022Updated 3 years ago
- A framework that helps implementing swizzle GPU kernels☆51Feb 29, 2020Updated 6 years ago
- Equivalent and redundant mutant detection with e-graphs!!!☆13Jun 14, 2023Updated 2 years ago
- Slicing a PyTorch Tensor Into Parallel Shards☆300Jun 7, 2025Updated 8 months ago
- this is the release repository of superneurons☆54Feb 13, 2021Updated 5 years ago
- MONeT framework for reducing memory consumption of DNN training☆174May 4, 2021Updated 4 years ago
- ☆11Apr 5, 2021Updated 4 years ago
- ☆42Sep 8, 2023Updated 2 years ago
- Haskell experiments involving TVM AI framework☆20Apr 26, 2019Updated 6 years ago
- Depict GPU memory footprint during DNN training of PyTorch☆11Nov 17, 2022Updated 3 years ago
- Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Mult…☆41Mar 17, 2024Updated last year
- ☆36Dec 9, 2024Updated last year
- An experimental ahead of time compiler for Relay.☆49Apr 21, 2020Updated 5 years ago
- A tensor-aware point-to-point communication primitive for machine learning☆283Dec 17, 2025Updated 2 months ago
- Model-less Inference Serving☆94Nov 4, 2023Updated 2 years ago
- BytePS examples (Vision, NLP, GAN, etc)☆19Nov 24, 2022Updated 3 years ago
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆51Jul 23, 2024Updated last year
- ☆192Mar 28, 2023Updated 2 years ago
- ☆23Apr 28, 2023Updated 2 years ago
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆24Nov 21, 2024Updated last year
- ☆145Jan 30, 2025Updated last year
- ☆13Nov 1, 2021Updated 4 years ago
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆62Jul 1, 2022Updated 3 years ago
- ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training☆199Dec 22, 2022Updated 3 years ago
- [CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".☆14Sep 1, 2022Updated 3 years ago
- Visualize TVM Relay program graph☆12Nov 19, 2019Updated 6 years ago
- The Tensor Algebra SuperOptimizer for Deep Learning☆739Jan 26, 2023Updated 3 years ago
- TVMScript kernel for deformable attention☆25Dec 15, 2021Updated 4 years ago