ross-tsenov / rebuild-pytorch-tensor-from-pointerLinks
A code sample demonstrating how to share and rebuild a PyTorch GPU tensor via its pointer/reference between different processes.
☆13Updated 11 months ago
Alternatives and similar repositories for rebuild-pytorch-tensor-from-pointer
Users that are interested in rebuild-pytorch-tensor-from-pointer are comparing it to the libraries listed below
Sorting:
- Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)☆12Updated last month
- A lightweight design for computation-communication overlap.☆155Updated last month
- ☆129Updated 8 months ago
- ☆228Updated last year
- High performance Transformer implementation in C++.☆129Updated 6 months ago
- CUTLASS and CuTe Examples☆68Updated 3 weeks ago
- ☆98Updated 2 months ago
- ☆129Updated 3 months ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆394Updated this week
- Dynamic Memory Management for Serving LLMs without PagedAttention☆405Updated 2 months ago
- A Easy-to-understand TensorOp Matmul Tutorial☆370Updated 10 months ago
- Examples of CUDA implementations by Cutlass CuTe☆218Updated last month
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆50Updated 2 weeks ago
- Implement Flash Attention using Cute.☆92Updated 7 months ago
- A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆56Updated last week
- Experimental projects related to TensorRT☆110Updated this week
- nnScaler: Compiling DNN models for Parallel Training☆114Updated last week
- Fastest kernels written from scratch☆311Updated 4 months ago
- flash attention tutorial written in python, triton, cuda, cutlass☆402Updated 2 months ago
- DeeperGEMM: crazy optimized version☆71Updated 3 months ago
- ☆106Updated 7 months ago
- ☆52Updated 3 weeks ago
- ☆60Updated 3 months ago
- ☆51Updated 2 months ago
- Thunder Research Group's Collective Communication Library☆39Updated last month
- Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO☆31Updated last year
- ☆36Updated 3 weeks ago
- An experimental CPU backend for Triton☆139Updated 2 months ago
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆183Updated 6 months ago
- Paper reading and discussion notes, covering AI frameworks, distributed systems, cluster management, etc.☆16Updated 4 months ago