ross-tsenov / rebuild-pytorch-tensor-from-pointerLinks
A code sample demonstrating how to share and rebuild a PyTorch GPU tensor via its pointer/reference between different processes.
☆12Updated 10 months ago
Alternatives and similar repositories for rebuild-pytorch-tensor-from-pointer
Users that are interested in rebuild-pytorch-tensor-from-pointer are comparing it to the libraries listed below
Sorting:
- Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)☆12Updated 2 weeks ago
- 面向多平台编译优化的深度学习中间表示☆10Updated 8 months ago
- ☆11Updated 6 months ago
- ☆11Updated 9 months ago
- A lightweight design for computation-communication overlap.☆148Updated 3 weeks ago
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆48Updated 2 months ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆79Updated 8 months ago
- ☆19Updated 3 months ago
- Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO☆30Updated last year
- High performance Transformer implementation in C++.☆126Updated 6 months ago
- nnScaler: Compiling DNN models for Parallel Training☆113Updated last week
- ☆80Updated 3 months ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆383Updated this week
- Fastest kernels written from scratch☆292Updated 3 months ago
- ☆106Updated 8 months ago
- Fast OS-level support for GPU checkpoint and restore☆217Updated this week
- CUTLASS and CuTe Examples☆63Updated this week
- DRAM/SSD hybrid caching system☆14Updated 4 months ago
- ☆79Updated 2 years ago
- Thunder Research Group's Collective Communication Library☆38Updated last week
- Paper reading and discussion notes, covering AI frameworks, distributed systems, cluster management, etc.☆15Updated 4 months ago
- Official code for the paper "HEXA-MoE: Efficient and Heterogeneous-Aware MoE Acceleration with Zero Computation Redundancy"☆13Updated 4 months ago
- ☆100Updated 6 months ago
- ☆50Updated last month
- A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems☆186Updated 9 months ago
- DeepSeek-V3/R1 inference performance simulator☆155Updated 3 months ago
- Perplexity GPU Kernels☆405Updated this week
- Ultra and Unified CCL☆413Updated this week
- ☆87Updated 2 months ago
- DeeperGEMM: crazy optimized version☆69Updated 2 months ago