Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involvement of the CPU beyond the initial kernel launch.
☆22Apr 25, 2024Updated last year
Alternatives and similar repositories for CPU-Free-model
Users that are interested in CPU-Free-model are comparing it to the libraries listed below
Sorting:
- My notes on various HPC papers.☆26Jan 7, 2023Updated 3 years ago
- ☆18Nov 11, 2025Updated 3 months ago
- Tutorials for NVIDIA CUPTI samples☆55Nov 3, 2025Updated 3 months ago
- GPUDirect Async suite☆17Dec 5, 2018Updated 7 years ago
- ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.☆27Jul 6, 2023Updated 2 years ago
- Scalable radix top-k selection on GPUs.☆21Jan 27, 2025Updated last year
- Implementation and analysis of five different GPU based SPMV algorithms in CUDA☆40Feb 5, 2019Updated 7 years ago
- An ultra-fast, GPU-based large graph embedding algorithm utilizing a novel coarsening algorithm requiring not more than a single GPU.☆24Jan 3, 2022Updated 4 years ago
- An interactive web-based tool for exploring intermediate representations of PyTorch and Triton models☆49Jan 23, 2026Updated last month
- Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Mult…☆41Mar 17, 2024Updated last year
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆56Jul 3, 2022Updated 3 years ago
- A fast and accurate reuse distance analyzer for multi-threaded applications. It leverages existing hardware features in commodity CPUs.☆22Feb 3, 2023Updated 3 years ago
- Benchmarks to capture important workloads.☆32Feb 5, 2026Updated 3 weeks ago
- Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite☆69Sep 12, 2018Updated 7 years ago
- A safe Rust wrapper for VapourSynth.☆34Jan 19, 2026Updated last month
- ☆32Aug 24, 2022Updated 3 years ago
- Sparsity support for PyTorch☆38Mar 22, 2025Updated 11 months ago
- This project compares the performance of Swin-Transformer v2 implemented in JAX and PyTorch.☆12Jun 8, 2022Updated 3 years ago
- easily run parts of avisynth script in external processes☆12Oct 27, 2017Updated 8 years ago
- Digital SuperTwin: digital twin of supercomputers☆13Nov 24, 2024Updated last year
- ☆11May 24, 2024Updated last year
- Code-Implementation-of-Super-Resolution-ZOO (image & video)☆10Jul 6, 2020Updated 5 years ago
- zkSnark circuit compiler☆12Feb 19, 2026Updated last week
- ☆41Mar 31, 2022Updated 3 years ago
- Hiding Images in Plain Sight: Deep Steganography☆40Jun 18, 2018Updated 7 years ago
- Cross-platform D2V creator☆39Sep 25, 2023Updated 2 years ago
- A tool for examining GPU scheduling behavior.☆94Aug 17, 2024Updated last year
- CLTune: An automatic OpenCL & CUDA kernel tuner☆185Dec 12, 2022Updated 3 years ago
- [TMM 2023] Official Implementation of "Bidirectional Translation Between UHD-HDR and HD-SDR Videos"☆10Aug 8, 2024Updated last year
- Hyperspectral Image Super-Resolution via Adjacent Spectral Fusion Strategy☆10Mar 31, 2022Updated 3 years ago
- ☆10Nov 8, 2021Updated 4 years ago
- ☆12Mar 17, 2020Updated 5 years ago
- Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration☆34Jan 8, 2026Updated last month
- Sources of the tusistor TUI app and the rusistor lib.☆14Dec 29, 2025Updated last month
- ReadMpls filter for VapourSynth☆12Oct 5, 2021Updated 4 years ago
- AI Deinterlacing functions for Vapoursynth☆17Nov 4, 2025Updated 3 months ago
- A patchless architecture, based on MLP-Mixer☆18Dec 30, 2021Updated 4 years ago
- An NFT collection to commemorate players of the Curta team for their participation and performance in the 2023 Paradigm CTF.☆11Nov 2, 2023Updated 2 years ago
- Enhanced version of the React.JS tutorial☆11Mar 16, 2025Updated 11 months ago