vllm-project/vllm-nccl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vllm-project/vllm-nccl)

vllm-project / vllm-nccl

Manages vllm-nccl dependency

☆18

Alternatives and similar repositories for vllm-nccl

Users that are interested in vllm-nccl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

feifeibear / PSTensor
View on GitHub
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.
☆10Feb 10, 2022Updated 4 years ago
hscspring / bytepiece-rs
View on GitHub
The Bytepiece Tokenizer Implemented in Rust.
☆15Nov 28, 2023Updated 2 years ago
vllm-project / tml-fa4
View on GitHub
FA4-based Relative Attention Kernel developed by TML and Colfax
☆17Jul 17, 2026Updated last week
zhuzilin / pytorch-malloc
View on GitHub
An external memory allocator example for PyTorch.
☆16Aug 10, 2025Updated 11 months ago
xdit-project / DiTCacheAnalysis
View on GitHub
An auxiliary project analysis of the characteristics of KV in DiT Attention.
☆34Nov 29, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
zxytim / arithmetic-encoding-compression
View on GitHub
☆11Apr 3, 2023Updated 3 years ago
hpcaitech / Elixir
View on GitHub
Elixir: Train a Large Language Model on a Small GPU Cluster
☆16Jun 8, 2023Updated 3 years ago
UNITES-Lab / Occult
View on GitHub
[ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…
☆13Apr 17, 2025Updated last year
foundation-model-stack / vllm-triton-backend
View on GitHub
A Triton-only attention backend for vLLM
☆27Jul 14, 2026Updated 2 weeks ago
fpgasystems / fpga-hyperloglog
View on GitHub
FPGA-based HyperLogLog Accelerator
☆12Jul 13, 2020Updated 6 years ago
JieRen98 / SGEMM-SASS-Annotation
View on GitHub
☆21Mar 22, 2021Updated 5 years ago
GeeeekExplorer / kkbot
View on GitHub
A Feishu/Lark AI agent bot
☆15Feb 27, 2026Updated 5 months ago
jychen21 / Habana-LLM-Viewer
View on GitHub
☆13Jul 24, 2024Updated 2 years ago
miemiekurisu / qwen3asr_cpu
View on GitHub
A high-performance C/C++ inference server for Qwen3-ASR , optimized for CPU/GPU real-time streaming speech recognition.
☆15Jun 27, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Azure / msccl-executor-nccl
View on GitHub
☆47Dec 13, 2024Updated last year
feifeibear / PyTorchMemTracer
View on GitHub
Depict GPU memory footprint during DNN training of PyTorch
☆11Nov 17, 2022Updated 3 years ago
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
PipeFusion / PipeFusion
View on GitHub
A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters
☆58May 3, 2026Updated 2 months ago
xiatwhu / baidu_topk
View on GitHub
☆15Dec 1, 2023Updated 2 years ago
chhzh123 / ptc-tutorial
View on GitHub
PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo
☆17Mar 13, 2023Updated 3 years ago
TurboNLP / Translate-Demo
View on GitHub
A Translation Task using TurboTransformers
☆10Dec 17, 2020Updated 5 years ago
Olament / Hanzi2PinyinEngine
View on GitHub
Hanzi to Pinyin engine in Swift 拼音输入法引擎
☆13Mar 29, 2024Updated 2 years ago
LeiWang1999 / Stream-k.tvm
View on GitHub
☆20Sep 28, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Jarviswx / tonghuashun_text_matching
View on GitHub
同花顺算法挑战平台：【9-10双月赛】跨领域迁移的文本语义匹配
☆11Oct 28, 2021Updated 4 years ago
LMCache / LMIgnite
View on GitHub
☆28Jul 29, 2025Updated last year
ise-uiuc / NablaFuzz
View on GitHub
Fuzzing Automatic Differentiation in Deep-Learning Libraries (ICSE'23)
☆27Mar 2, 2024Updated 2 years ago
jkehne / cuda-malloc-hook
View on GitHub
Drop-in library for tracking the memory allocations of CUDA applications
☆14Nov 17, 2017Updated 8 years ago
Bruce-Lee-LY / flash_attention_inference
View on GitHub
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
☆45Feb 27, 2025Updated last year
taishan1994 / PPO_Chinese_Generate
View on GitHub
☆11May 2, 2023Updated 3 years ago
FanHansen / creditmodel
View on GitHub
creditmodel, 模型，评分卡，scorecard, vintage, automatic modeling
☆11Aug 10, 2024Updated last year
feifeibear / SeeReel
View on GitHub
Agent-native Seedance 2.0 short-film studio: cli for AI, canvas for human
☆15Jun 14, 2026Updated last month
zhuzilin / chatgpt-desktop
View on GitHub
Desktop version of ChatGPT, support manually set cookie
☆19Dec 9, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
THUKElab / CCL2023-CLTC-THU_KELab
View on GitHub
This repository open-sources our GEC system submitted by THU KELab (sz) in the CCL2023-CLTC Track 1: Multidimensional Chinese Learner Tex…
☆15Nov 25, 2023Updated 2 years ago
dimdano / faiss-fpga
View on GitHub
An FPGA integration and acceleration of the popular FAISS framework for approximate similarity search
☆25Jul 20, 2019Updated 7 years ago
LMCache / lmcache-vllm
View on GitHub
The driver for LMCache core to run in vLLM
☆69Feb 4, 2025Updated last year
ryantd / veloce
View on GitHub
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
☆17Aug 4, 2022Updated 3 years ago
mrubash1 / keras-semantic-segmentation
View on GitHub
semantic segmentation using keras
☆15Apr 8, 2017Updated 9 years ago
xrp-project / BPF-KV
View on GitHub
☆28Mar 2, 2023Updated 3 years ago
AlibabaResearch / flash-llm
View on GitHub
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆246Sep 24, 2023Updated 2 years ago