A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.
☆105Dec 17, 2025Updated 4 months ago
Alternatives and similar repositories for asystem-amem
Users that are interested in asystem-amem are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆37Dec 9, 2025Updated 4 months ago
- An experimental communicating attention kernel based on DeepEP.☆34Jul 29, 2025Updated 9 months ago
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆519Updated this week
- Tutorials for NVIDIA CUPTI samples☆64Nov 3, 2025Updated 6 months ago
- ☆20Nov 18, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- An ultra-fast, distributed Safetensors loader☆45Apr 27, 2026Updated last week
- ☆11Mar 15, 2026Updated last month
- ☆30Apr 8, 2026Updated 3 weeks ago
- APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM tra…☆57Oct 11, 2025Updated 6 months ago
- ☆27Apr 27, 2026Updated last week
- ☆165Dec 27, 2024Updated last year
- ☆57Feb 24, 2026Updated 2 months ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆98Updated this week
- ☆18Nov 11, 2025Updated 5 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Pytorch routines for (Ker)nel (Mac)hines☆12Oct 10, 2025Updated 6 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆180Feb 11, 2026Updated 2 months ago
- ☆13Jan 7, 2025Updated last year
- DeeperGEMM: crazy optimized version☆86May 5, 2025Updated 11 months ago
- A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from trainin…☆150Apr 22, 2026Updated last week
- NVIDIA Inference Xfer Library (NIXL)☆1,011Updated this week
- A collection of workload implementations for the LDBC SNB benchmark driver☆20Jun 7, 2021Updated 4 years ago
- GPUDirect Async support for IB Verbs☆137Nov 10, 2022Updated 3 years ago
- a simple API to use CUPTI☆10Aug 19, 2025Updated 8 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆453Aug 10, 2025Updated 8 months ago
- Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.☆61Nov 24, 2025Updated 5 months ago
- train a model on huchenfeng dataset☆52Dec 8, 2025Updated 4 months ago
- A Top-Down Profiler for GPU Applications☆22Feb 29, 2024Updated 2 years ago
- ☆27Aug 31, 2023Updated 2 years ago
- CUDA 12.2 HMM demos☆21Jul 26, 2024Updated last year
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆68Dec 11, 2025Updated 4 months ago
- Important experiments on memory management, file access, network transfer, job scheduler, and so on.☆15Apr 27, 2022Updated 4 years ago
- UBio-MolFM is a foundation model suite for molecular modeling, developed by the UBio-MolFM team.☆26Apr 13, 2026Updated 3 weeks ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Prefix-Aware Attention for LLM Decoding☆37Mar 31, 2026Updated last month
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,295Aug 28, 2025Updated 8 months ago
- Supplementary material for our paper "Compute Trends Across Three Eras of Machine Learning".☆47Mar 12, 2022Updated 4 years ago
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆15Jan 16, 2026Updated 3 months ago
- Ring attention implementation with flash attention☆1,015Sep 10, 2025Updated 7 months ago
- Distributed MoE in a Single Kernel [NeurIPS '25]☆249Apr 27, 2026Updated last week
- eBPF for GPU UVM offloading and scheduling in Linux kernel☆51Apr 15, 2026Updated 2 weeks ago