Perplexity open source garden for inference technology
☆371Dec 25, 2025Updated 2 months ago
Alternatives and similar repositories for pplx-garden
Users that are interested in pplx-garden are comparing it to the libraries listed below
Sorting:
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆92Jan 26, 2026Updated last month
- Perplexity GPU Kernels☆567Nov 7, 2025Updated 4 months ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆92Updated this week
- Open ABI and FFI for Machine Learning Systems☆355Updated this week
- Nex Venus Communication Library☆72Nov 17, 2025Updated 3 months ago
- AI model training on heterogeneous, geo-distributed resources☆38Nov 24, 2025Updated 3 months ago
- NVIDIA Inference Xfer Library (NIXL)☆898Feb 28, 2026Updated last week
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆165Feb 11, 2026Updated 3 weeks ago
- ☆13Nov 21, 2024Updated last year
- A lightweight design for computation-communication overlap.☆223Jan 20, 2026Updated last month
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆93Dec 2, 2025Updated 3 months ago
- Distributed Compiler based on Triton for Parallel Systems☆1,371Feb 13, 2026Updated 3 weeks ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆475Feb 28, 2026Updated last week
- ☆19Jun 1, 2025Updated 9 months ago
- FlashInfer: Kernel Library for LLM Serving☆5,057Updated this week
- UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g…☆1,224Feb 28, 2026Updated last week
- KV cache store for distributed LLM inference☆396Nov 13, 2025Updated 3 months ago
- 训练营训练方向项目☆26Jan 28, 2026Updated last month
- ☆14Dec 13, 2024Updated last year
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated last month
- Building the Virtuous Cycle for AI-driven LLM Systems☆192Feb 27, 2026Updated last week
- Distributed MoE in a Single Kernel [NeurIPS '25]☆194Feb 27, 2026Updated last week
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆33Nov 29, 2024Updated last year
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆112Feb 28, 2026Updated last week
- Fast OS-level support for GPU checkpoint and restore☆271Sep 28, 2025Updated 5 months ago
- A Datacenter Scale Distributed Inference Serving Framework☆6,154Feb 28, 2026Updated last week
- ☆16Apr 22, 2025Updated 10 months ago
- ☆15Apr 20, 2022Updated 3 years ago
- ☆347Jan 28, 2026Updated last month
- Venus Collective Communication Library, supported by SII and Infrawaves.☆138Updated this week
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,264Aug 28, 2025Updated 6 months ago
- Thunder Research Group's Collective Communication Library☆47Jul 8, 2025Updated 7 months ago
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆469Feb 28, 2026Updated last week
- ☆38Aug 7, 2025Updated 7 months ago
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter☆149Feb 27, 2026Updated last week
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆66Dec 11, 2025Updated 2 months ago
- ☆37Jul 19, 2025Updated 7 months ago
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆262Nov 18, 2024Updated last year