inclusionAI/asystem-amem

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/inclusionAI/asystem-amem)

inclusionAI / asystem-amem

A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.

☆111

Alternatives and similar repositories for asystem-amem

Users that are interested in asystem-amem are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

inclusionAI / AState
View on GitHub
☆41Updated this week
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
antgroup / DeepXTrace
View on GitHub
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
☆101Jan 16, 2026Updated 6 months ago
inclusionAI / Awex
View on GitHub
A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from trainin…
☆165Updated this week
sii-research / VCCL
View on GitHub
Venus Collective Communication Library, supported by SII and Infrawaves.
☆152Jun 24, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
NVIDIA / nccl-extensions
View on GitHub
Communication patterns for AI, built on top of NCCL device and host APIs
☆18Updated this week
KuangjuX / NVSHMEM-Tutorial
View on GitHub
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆195Feb 11, 2026Updated 5 months ago
nex-agi / NexVenusCL
View on GitHub
Nex Venus Communication Library
☆75Nov 17, 2025Updated 8 months ago
meta-pytorch / torchcomms
View on GitHub
torchcomms: a modern PyTorch communications API
☆380Updated this week
stepfun-ai / StepMesh
View on GitHub
☆378Jan 28, 2026Updated 5 months ago
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆86May 5, 2025Updated last year
perplexityai / pplx-kernels
View on GitHub
Perplexity GPU Kernels
☆591Nov 7, 2025Updated 8 months ago
aikitoria / nanotrace
View on GitHub
Low overhead tracing library and trace visualizer for pipelined CUDA kernels
☆137Jul 17, 2026Updated last week
flashinfer-ai / cubloaty
View on GitHub
a size profiler for cuda binary
☆71Jan 15, 2026Updated 6 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
NVIDIA / nvshmem
View on GitHub
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆563Updated this week
fzyzcjy / torch_memory_saver
View on GitHub
Allow torch tensor memory to be released and resumed later
☆260Updated this week
ademeure / cuda-side-boost
View on GitHub
☆60Feb 24, 2026Updated 5 months ago
infinigence / FUSCO
View on GitHub
High-performance distributed data shuffling (all-to-all) library for MoE training and inference
☆123Mar 7, 2026Updated 4 months ago
fzyzcjy / torch_utils
View on GitHub
Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…
☆114Sep 11, 2025Updated 10 months ago
uccl-project / mKernel
View on GitHub
mKernel: fast multi-node, multi-GPU fused kernels
☆254Jun 21, 2026Updated last month
tile-ai / tilescale
View on GitHub
Tile-based language built for AI computation across all scales
☆174Jun 16, 2026Updated last month
TransferQueue / TransferQueue
View on GitHub
[Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…
☆16Jan 16, 2026Updated 6 months ago
foundation-model-stack / vllm-triton-backend
View on GitHub
A Triton-only attention backend for vLLM
☆27Jul 14, 2026Updated last week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
hao-ai-lab / DistCA
View on GitHub
Efficient Long-context Language Model Training by Core Attention Disaggregation
☆106Apr 7, 2026Updated 3 months ago
lemyx / tilelang-dsa
View on GitHub
DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang
☆47Nov 19, 2025Updated 8 months ago
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
yifuwang / symm-mem-recipes
View on GitHub
☆170Dec 27, 2024Updated last year
ISEEKYAN / mbridge
View on GitHub
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
☆226Jun 15, 2026Updated last month
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,498Updated this week
osayamenja / FlashMoE
View on GitHub
Distributed MoE in a Single Kernel [NeurIPS '25]
☆273May 5, 2026Updated 2 months ago
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆732Jul 4, 2026Updated 3 weeks ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
perplexityai / pplx-garden
View on GitHub
Perplexity open source garden for inference technology
☆604May 27, 2026Updated last month
Zhaojp-Frank / AwesomePaper-for-AI
View on GitHub
Awesome system papers for AI
☆21Updated this week
uccl-project / uccl
View on GitHub
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g…
☆1,469Updated this week
mlc-ai / pith-train
View on GitHub
Compact and Agent-Native MoE Training System
☆297Updated this week
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,147Updated this week
xinhao-luo / ClusterFusion
View on GitHub
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
☆75Dec 11, 2025Updated 7 months ago
meta-pytorch / kraken
View on GitHub
Triton-based Symmetric Memory operators and examples
☆106May 15, 2026Updated 2 months ago