perplexityai/pplx-garden

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/perplexityai/pplx-garden)

perplexityai / pplx-garden

Perplexity open source garden for inference technology

☆371

Alternatives and similar repositories for pplx-garden

Users that are interested in pplx-garden are comparing it to the libraries listed below

Sorting:

DeepLink-org / DLSlime
View on GitHub
DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
☆92Jan 26, 2026Updated last month
perplexityai / pplx-kernels
View on GitHub
Perplexity GPU Kernels
☆567Nov 7, 2025Updated 4 months ago
hao-ai-lab / DistCA
View on GitHub
Efficient Long-context Language Model Training by Core Attention Disaggregation
☆92Updated this week
apache / tvm-ffi
View on GitHub
Open ABI and FFI for Machine Learning Systems
☆355Updated this week
nex-agi / NexVenusCL
View on GitHub
Nex Venus Communication Library
☆72Nov 17, 2025Updated 3 months ago
eth-easl / sailor
View on GitHub
AI model training on heterogeneous, geo-distributed resources
☆38Nov 24, 2025Updated 3 months ago
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆898Feb 28, 2026Updated last week
KuangjuX / NVSHMEM-Tutorial
View on GitHub
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆165Feb 11, 2026Updated 3 weeks ago
host-architecture / Fast-and-Safe-IO-Memory-Protection
View on GitHub
☆13Nov 21, 2024Updated last year
infinigence / FlashOverlap
View on GitHub
A lightweight design for computation-communication overlap.
☆223Jan 20, 2026Updated last month
sspec-project / SparseSpec
View on GitHub
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
☆93Dec 2, 2025Updated 3 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,371Feb 13, 2026Updated 3 weeks ago
microsoft / mscclpp
View on GitHub
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆475Feb 28, 2026Updated last week
fpgasystems / Chameleon-RAG-Acceleration
View on GitHub
☆19Jun 1, 2025Updated 9 months ago
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,057Updated this week
uccl-project / uccl
View on GitHub
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g…
☆1,224Feb 28, 2026Updated last week
bytedance / InfiniStore
View on GitHub
KV cache store for distributed LLM inference
☆396Nov 13, 2025Updated 3 months ago
InfiniTensor / TinyInfiniTrain
View on GitHub
训练营训练方向项目
☆26Jan 28, 2026Updated last month
acryl-aaai / perf
View on GitHub
☆14Dec 13, 2024Updated last year
HydraQYH / hp_rms_norm
View on GitHub
High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)
☆30Jan 22, 2026Updated last month
flashinfer-ai / flashinfer-bench
View on GitHub
Building the Virtuous Cycle for AI-driven LLM Systems
☆192Feb 27, 2026Updated last week
osayamenja / FlashMoE
View on GitHub
Distributed MoE in a Single Kernel [NeurIPS '25]
☆194Feb 27, 2026Updated last week
xdit-project / DiTCacheAnalysis
View on GitHub
An auxiliary project analysis of the characteristics of KV in DiT Attention.
☆33Nov 29, 2024Updated last year
infinigence / FUSCO
View on GitHub
High-performance distributed data shuffling (all-to-all) library for MoE training and inference
☆112Feb 28, 2026Updated last week
SJTU-IPADS / PhoenixOS
View on GitHub
Fast OS-level support for GPU checkpoint and restore
☆271Sep 28, 2025Updated 5 months ago
ai-dynamo / dynamo
View on GitHub
A Datacenter Scale Distributed Inference Serving Framework
☆6,154Feb 28, 2026Updated last week
liangyuRain / ForestColl
View on GitHub
☆16Apr 22, 2025Updated 10 months ago
jasperzhong / swift
View on GitHub
☆15Apr 20, 2022Updated 3 years ago
stepfun-ai / StepMesh
View on GitHub
☆347Jan 28, 2026Updated last month
sii-research / VCCL
View on GitHub
Venus Collective Communication Library, supported by SII and Infrawaves.
☆138Updated this week
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,264Aug 28, 2025Updated 6 months ago
mcrl / tccl
View on GitHub
Thunder Research Group's Collective Communication Library
☆47Jul 8, 2025Updated 7 months ago
NVIDIA / nvshmem
View on GitHub
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆469Feb 28, 2026Updated last week
zhuzilin / flash-attention-with-sink
View on GitHub
☆38Aug 7, 2025Updated 7 months ago
mit-han-lab / fastrl
View on GitHub
[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
☆149Feb 27, 2026Updated last week
xinhao-luo / ClusterFusion
View on GitHub
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
☆66Dec 11, 2025Updated 2 months ago
mlc-ai / mlc-python
View on GitHub
☆37Jul 19, 2025Updated 7 months ago
flashserve / PAT
View on GitHub
Prefix-Aware Attention for LLM Decoding
☆29Jan 23, 2026Updated last month
efeslab / fiddler
View on GitHub
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
☆262Nov 18, 2024Updated last year