shriramsb / vDNNView external linksLinks
☆22Nov 7, 2018Updated 7 years ago
Alternatives and similar repositories for vDNN
Users that are interested in vDNN are comparing it to the libraries listed below
Sorting:
- Implementation of vDNN++; an improvement over vDNN☆18Dec 7, 2018Updated 7 years ago
- ☆13Feb 22, 2023Updated 2 years ago
- Thinking is hard - automate it☆18Aug 24, 2022Updated 3 years ago
- FTPipe and related pipeline model parallelism research.☆44May 16, 2023Updated 2 years ago
- Repository to go along with the paper "Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines"☆10Mar 31, 2022Updated 3 years ago
- Simple PyTorch profiler that combines DeepSpeed Flops Profiler and TorchInfo☆11Feb 12, 2023Updated 3 years ago
- DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression☆11Oct 7, 2020Updated 5 years ago
- ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.☆27Jul 6, 2023Updated 2 years ago
- ☆12May 3, 2020Updated 5 years ago
- A host-based framework that transparently extends the GPU addressable global memory space beyond the host memory using NVM-backed data po…☆63Sep 11, 2020Updated 5 years ago
- [ACM EuroSys 2023] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access☆56Aug 6, 2025Updated 6 months ago
- PetPS: Supporting Huge Embedding Models with Tiered Memory☆33May 21, 2024Updated last year
- ☆20Nov 12, 2025Updated 3 months ago
- ☆17Dec 9, 2022Updated 3 years ago
- ☆19Jul 26, 2021Updated 4 years ago
- Artifact of ASPLOS'23 paper entitled: GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model Inference☆19Mar 5, 2023Updated 2 years ago
- Training neural networks in TensorFlow 2.0 with 5x less memory☆137Feb 21, 2022Updated 3 years ago
- ☆21Nov 29, 2022Updated 3 years ago
- ☆18Mar 15, 2020Updated 5 years ago
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications☆127May 9, 2022Updated 3 years ago
- Hi-DMM: High-Performance Dynamic Memory Management in HLS (High-Level Synthesis)☆25Oct 30, 2018Updated 7 years ago
- Code that accompanies the paper "Predicting the Computational Cost of Deep Learning Models"☆21Dec 14, 2018Updated 7 years ago
- ☆26Dec 5, 2022Updated 3 years ago
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 6 months ago
- ☆23Jun 5, 2019Updated 6 years ago
- ☆24Jun 21, 2023Updated 2 years ago
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆58Aug 21, 2024Updated last year
- A hybrid cache sharing-partitioning tool for systems with Intel CAT support.☆31Mar 28, 2018Updated 7 years ago
- ☆36Jan 21, 2021Updated 5 years ago
- to study xilinx fpga using Zybo Z7-20 board☆14Mar 13, 2024Updated last year
- Carbon Explorer helps evaluating solutions make datacenters operate on renewable energy.☆87Nov 8, 2024Updated last year
- Fine-grained GPU sharing primitives☆148Jul 28, 2025Updated 6 months ago
- Continuous Pipelined Speculative Decoding☆16Jan 4, 2026Updated last month
- A simple script to plot the Roofline model for given HW platforms and applications☆10Aug 22, 2024Updated last year
- Anchored Diffusion Language Model (NeurIPS 2025)☆27Oct 13, 2025Updated 4 months ago
- ☆11Aug 23, 2023Updated 2 years ago
- Python bindings for the NVML. Non-volatile memory for Python.☆12May 23, 2016Updated 9 years ago
- Reinforcement Learning (PPO) applied to a multiplayer simple card game (Witches)☆10Jun 7, 2020Updated 5 years ago
- ☆10Dec 8, 2021Updated 4 years ago