Sample Codes using NVSHMEM on Multi-GPU
☆30Jan 22, 2023Updated 3 years ago
Alternatives and similar repositories for NVSHEMEM
Users that are interested in NVSHEMEM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A lightweight design for computation-communication overlap.☆225Jan 20, 2026Updated 2 months ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆22Mar 18, 2026Updated last week
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆356Dec 3, 2025Updated 3 months ago
- ☆23Jul 11, 2025Updated 8 months ago
- A small RISC-V kernel coding by C, tested on sifive unmatched board.☆16Aug 20, 2022Updated 3 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Website for CSE 234, Winter 2025☆13Mar 24, 2025Updated last year
- ☆53Feb 24, 2026Updated last month
- ☆119May 16, 2025Updated 10 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆172Feb 11, 2026Updated last month
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆13Nov 23, 2024Updated last year
- Expert Specialization MoE Solution based on CUTLASS☆27Jan 19, 2026Updated 2 months ago
- ☆16Feb 24, 2026Updated last month
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Mar 24, 2025Updated last year
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- ☆13Jan 7, 2025Updated last year
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated last year
- NCCL Examples from Official NVIDIA NCCL Developer Guide.☆20May 29, 2018Updated 7 years ago
- ring-attention experiments☆168Oct 17, 2024Updated last year
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆73Sep 8, 2024Updated last year
- Github repository for "Big Data in Astrophysics" - Spring 2021☆15Apr 26, 2021Updated 4 years ago
- Official Repo of CudaForge☆70Dec 2, 2025Updated 3 months ago
- The official implementation for the intra-stage fusion technique introduced in https://arxiv.org/abs/2409.13221☆31Apr 22, 2025Updated 11 months ago
- study of cutlass☆22Nov 10, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆48Updated this week
- [NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training☆31May 2, 2025Updated 10 months ago
- ASTR596: Fundamentals of Data Science at UIUC Astronomy, Spring 2023☆14May 13, 2023Updated 2 years ago
- Pure Triton kernels for Qwen3.5-27B inference on NVIDIA B200☆83Feb 28, 2026Updated last month
- Thunder Research Group's Collective Communication Library☆50Jul 8, 2025Updated 8 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆98Sep 19, 2025Updated 6 months ago
- ☆13Jan 28, 2026Updated 2 months ago
- Tutorial Exercises and Code for GPU Communications Tutorial at HOT Interconnects 2025☆31Oct 22, 2025Updated 5 months ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Feb 20, 2026Updated last month
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- [ICML 2025] This is the official PyTorch implementation of "OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniv…☆27Jun 16, 2025Updated 9 months ago
- My Paper Reading Lists and Notes.☆21Mar 13, 2026Updated 2 weeks ago
- Student handbook for the Applied Galactic Dynamics School at the Flatiron Institute (2021)☆11Jul 6, 2021Updated 4 years ago
- PyTorch implementation of the Flash Spectral Transform Unit.☆22Sep 19, 2024Updated last year
- ☆94May 31, 2025Updated 9 months ago
- Github repository for "Big Data in Astrophysics" - Spring 2022☆14Apr 27, 2022Updated 3 years ago
- Asynchronous pipeline parallel optimization☆19Feb 2, 2026Updated last month