RBLN-SW/vllm-rbln

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/RBLN-SW/vllm-rbln)

RBLN-SW / vllm-rbln

vLLM plugin for RBLN NPU

☆56

Alternatives and similar repositories for vllm-rbln

Users that are interested in vllm-rbln are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

RBLN-SW / optimum-rbln
View on GitHub
⚡ A seamless integration of HuggingFace Transformers & Diffusers with RBLN SDK for efficient inference on RBLN NPUs.
☆19Updated this week
SqueezeBits / Torch-TRTLLM
View on GitHub
Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.
☆57Jul 16, 2025Updated last year
SqueezeBits / QUICK
View on GitHub
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
☆123Mar 6, 2024Updated 2 years ago
HabanaAI / Habana_Custom_Kernel
View on GitHub
Provides the examples to write and build Habana custom kernels using the HabanaTools
☆26Apr 15, 2025Updated last year
scale-snu / layered-prefill
View on GitHub
Layered prefill changes the scheduling axis from tokens to layers and removes redundant MoE weight reloads while keeping decode stall fre…
☆18Mar 9, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
sjquan / 2022-Study
View on GitHub
☆55Nov 22, 2022Updated 3 years ago
kaist-ina / Trinity-AE
View on GitHub
Source code for Trinity(ASPLOS 2026)
☆23Apr 24, 2026Updated 2 months ago
SemiAnalysisAI / InferenceX-app
View on GitHub
Dashboard for InferenceX™, Open Source Continuous Inference
☆36Updated this week
efeslab / siloz
View on GitHub
☆11Aug 23, 2023Updated 2 years ago
vllm-project / vllm-gaudi
View on GitHub
Community maintained hardware plugin for vLLM on Intel Gaudi
☆49Updated this week
torch-spyre / sendnn-inference
View on GitHub
Community maintained hardware plugin for vLLM on Spyre
☆52Updated this week
wzh99 / GenCoG
View on GitHub
GenCoG: A DSL-Based Approach to Generating Computation Graphs for TVM Testing (ISSTA‘23)
☆17Jul 19, 2023Updated 3 years ago
ai-computing / aicomp
View on GitHub
☆23Jul 10, 2026Updated last week
jiwonsong-dev / SLEB
View on GitHub
[ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
☆41Feb 4, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
stephenrkell / donald
View on GitHub
The Mickey Mouse of dynamic linkers
☆16Nov 15, 2025Updated 8 months ago
HPAC / ELAPS
View on GitHub
Experimental Linear Algebra Performance Studies
☆12Feb 24, 2017Updated 9 years ago
mscheong01 / speculative_decoding.c
View on GitHub
minimal C implementation of speculative decoding based on llama2.c
☆30Jul 15, 2024Updated 2 years ago
intel / idxd
View on GitHub
☆15Jan 7, 2023Updated 3 years ago
AlibabaPAI / FLASHNN
View on GitHub
☆106Sep 9, 2024Updated last year
russellb / canhazgpu
View on GitHub
A simple GPU reservation tool for single host shared development systems
☆29Jul 6, 2026Updated 2 weeks ago
FasterDecoding / TEAL
View on GitHub
☆167Feb 15, 2025Updated last year
mxzheng / TrojViT
View on GitHub
[CVPR 2023] "TrojViT: Trojan Insertion in Vision Transformers" by Mengxin Zheng, Qian Lou, Lei Jiang
☆15Jan 5, 2024Updated 2 years ago
scale-snu / Sudoku
View on GitHub
A tool for decomposing DRAM address mapping into component-level functions
☆16Jun 12, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
microsoft / Panopticon
View on GitHub
Panopticon is a complete in-DRAM RowHammer mitigation. This code simulates an implementation of Panopticon in DDR5.
☆14Jun 2, 2023Updated 3 years ago
cchan / tccl
View on GitHub
extensible collectives library in triton
☆97Mar 31, 2025Updated last year
nullplay / Unified-Convolution-Framework
View on GitHub
☆10Apr 24, 2023Updated 3 years ago
vedantroy / gpu_kernels
View on GitHub
☆27Jan 8, 2024Updated 2 years ago
shuzhangzhong / HybriMoE-Preview
View on GitHub
☆17Apr 9, 2025Updated last year
AidenGeunGeun / OpencodeOrchestra
View on GitHub
Multi-layer agent orchestration. PM plans, specialists execute.
☆16May 24, 2026Updated last month
hlaueriksson / playwright-dotnet-contrib
View on GitHub
Contributions to Playwright for .NET 🎭🧪
☆12Nov 20, 2023Updated 2 years ago
HanGuo97 / hilt
View on GitHub
☆40Dec 14, 2025Updated 7 months ago
scale-snu / DyLLM
View on GitHub
☆19May 21, 2026Updated 2 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
vllm-project / tpu-inference
View on GitHub
TPU inference for vLLM, with unified JAX and PyTorch support.
☆387Updated this week
scale-snu / LLMSimulator
View on GitHub
☆56Oct 14, 2025Updated 9 months ago
merledu / Google-Summer-of-Code
View on GitHub
Project ideas list for Google Summer of Code.
☆18Jan 28, 2026Updated 5 months ago
radha-patel / SySTeC
View on GitHub
Performant kernels for symmetric tensors
☆17Aug 22, 2024Updated last year
Idein / onnigiri
View on GitHub
☆13Jul 10, 2026Updated last week
CMU-SAFARI / transpimlib
View on GitHub
TransPimLib is a library for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, TransPimLib provides …
☆16Apr 21, 2023Updated 3 years ago
GindaChen / FlexFlashAttention3
View on GitHub
FlexAttention w/ FlashAttention3 Support
☆27Oct 5, 2024Updated last year