ross-tsenov / rebuild-pytorch-tensor-from-pointerLinks
A code sample demonstrating how to share and rebuild a PyTorch GPU tensor via its pointer/reference between different processes.
☆13Updated last year
Alternatives and similar repositories for rebuild-pytorch-tensor-from-pointer
Users that are interested in rebuild-pytorch-tensor-from-pointer are comparing it to the libraries listed below
Sorting:
- Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO☆31Updated last year
- Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)☆12Updated 2 months ago
- An experimental CPU backend for Triton☆146Updated 3 months ago
- A lightweight design for computation-communication overlap.☆161Updated this week
- Experimental projects related to TensorRT☆111Updated last week
- ☆28Updated 2 months ago
- CUTLASS and CuTe Examples☆72Updated last month
- A Quirky Assortment of CuTe Kernels☆435Updated this week
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆79Updated 9 months ago
- ☆136Updated 3 months ago
- ☆115Updated 8 months ago
- Fastest kernels written from scratch☆318Updated 5 months ago
- ☆229Updated last year
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆412Updated this week
- ☆36Updated last month
- High performance Transformer implementation in C++.☆129Updated 7 months ago
- ☆101Updated 3 months ago
- ☆28Updated last year
- nnScaler: Compiling DNN models for Parallel Training☆118Updated this week
- Thunder Research Group's Collective Communication Library☆40Updated last month
- Shared Middle-Layer for Triton Compilation☆280Updated this week
- ☆63Updated 4 months ago
- A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆58Updated this week
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆63Updated 2 months ago
- DeeperGEMM: crazy optimized version☆70Updated 4 months ago
- ☆131Updated 9 months ago
- Perplexity GPU Kernels☆451Updated 3 weeks ago
- collection of benchmarks to measure basic GPU capabilities☆412Updated 6 months ago
- Microsoft Collective Communication Library☆67Updated 9 months ago
- Development repository for the Triton-Linalg conversion☆197Updated 6 months ago