A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.
☆106Dec 17, 2025Updated 5 months ago
Alternatives and similar repositories for asystem-amem
Users that are interested in asystem-amem are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆37Dec 9, 2025Updated 5 months ago
- An experimental communicating attention kernel based on DeepEP.☆34Jul 29, 2025Updated 9 months ago
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆536May 5, 2026Updated 2 weeks ago
- Tutorials for NVIDIA CUPTI samples☆67Nov 3, 2025Updated 6 months ago
- ☆20Nov 18, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- An ultra-fast, distributed Safetensors loader☆55May 18, 2026Updated last week
- ☆14Mar 15, 2026Updated 2 months ago
- ☆30Apr 8, 2026Updated last month
- APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM tra…☆57Oct 11, 2025Updated 7 months ago
- ☆28May 11, 2026Updated last week
- Large language models to diffusion finetuning code☆26Jun 2, 2025Updated 11 months ago
- ☆66Apr 26, 2025Updated last year
- ☆167Dec 27, 2024Updated last year
- ☆57Feb 24, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Composable and Embeddable Communication Runtime for Distributed AI Services☆101Updated this week
- ☆18Nov 11, 2025Updated 6 months ago
- Pytorch routines for (Ker)nel (Mac)hines☆12Oct 10, 2025Updated 7 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆190Feb 11, 2026Updated 3 months ago
- ☆13Jan 7, 2025Updated last year
- DeeperGEMM: crazy optimized version☆86May 5, 2025Updated last year
- A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from trainin…☆150Apr 22, 2026Updated last month
- NVIDIA Inference Xfer Library (NIXL)☆1,041Updated this week
- A collection of workload implementations for the LDBC SNB benchmark driver☆20Jun 7, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- GPUDirect Async support for IB Verbs☆137Nov 10, 2022Updated 3 years ago
- a simple API to use CUPTI☆10Aug 19, 2025Updated 9 months ago
- ☆113Updated this week
- A lightweight design for computation-communication overlap.☆234Jan 20, 2026Updated 4 months ago
- ☆451Aug 10, 2025Updated 9 months ago
- Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.☆62Nov 24, 2025Updated 6 months ago
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,309Aug 28, 2025Updated 8 months ago
- Ring attention implementation with flash attention☆1,020Sep 10, 2025Updated 8 months ago
- train a model on huchenfeng dataset☆52Dec 8, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A Top-Down Profiler for GPU Applications☆22Feb 29, 2024Updated 2 years ago
- ☆27Aug 31, 2023Updated 2 years ago
- A simple demo for using Sentinel with Spring Cloud Alibaba☆17Nov 8, 2018Updated 7 years ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆97Jan 16, 2026Updated 4 months ago
- Fastest kernels written from scratch☆578Sep 18, 2025Updated 8 months ago
- CUDA 12.2 HMM demos☆21Jul 26, 2024Updated last year
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆71Dec 11, 2025Updated 5 months ago